🛡️

Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.

Quick verification

Please confirm you're human to continue.

InfrastructureEfficiency Updated 2026

Latency

The delay between sending a prompt and receiving a response. Larger models and longer contexts increase latency — the main speed trade-off in AI products.

Latency in an LLM system is usually split into time-to-first-token and time-per-output-token. Both grow with model size and sequence length, because attention cost scales with the amount of context; the survey by Tay et al. (2022) catalogues the efficiency techniques developed to mitigate this.

Approaches that reduce latency include sparsity (Mixture of Experts), quantization, caching and streaming — each trading some accuracy, memory or complexity for speed.

References

Primary, peer-reviewed and archival sources for this definition.

Efficient Transformers: A Survey

Tay, Y., Dehghani, M., Bahri, D., & Metzler, D. (2022). ACM Computing Surveys, 55(6), 1–28.

Source arXiv:2009.06732 DOI:10.1145/3530811

Dictionary & encyclopedic entries

Wikipedia — Latency (engineering)
Google Cloud — Latency — definition

Cite this entry

MultipleChat. "Latency." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/latency

Related terms

Streaming Mixture of Experts (MoE) Token

Back to the full glossary

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans

Pricing

Latency

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Related AI guides and next steps

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat