🛡️

Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.

Quick verification

Please confirm you're human to continue.

EfficiencyDecoding Updated 2026

Speculative Decoding

An inference-speedup technique where a small fast model drafts several tokens and a large model verifies them in parallel — faster output with identical results.

Leviathan et al. (2023) introduced speculative decoding: a small, cheap "draft" model proposes several tokens, and the large target model checks them all in a single parallel pass, accepting the longest correct prefix. Because verification is parallel, this yields 2–3× faster generation with provably the same output distribution as the large model alone.

It is a now-standard way to cut latency for large-model serving without changing the responses users see.

References

Primary, peer-reviewed and archival sources for this definition.

Fast Inference from Transformers via Speculative Decoding

Leviathan, Y., Kalman, M., & Matias, Y. (2023). Proceedings of the 40th International Conference on Machine Learning (ICML 2023).

Source arXiv:2211.17192

Dictionary & encyclopedic entries

Hugging Face — Assisted generation (speculative decoding)
Wikipedia — Speculative decoding

Cite this entry

MultipleChat. "Speculative Decoding." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/speculative-decoding

Related terms

Latency Streaming Quantization

Back to the full glossary

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans

Pricing

Speculative Decoding

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Related AI guides and next steps

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat