🛡️

Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.

Quick verification

Please confirm you're human to continue.

EfficiencyInfrastructure Updated 2026

Quantization

Storing a model's weights at lower numerical precision to cut memory use and speed up inference, usually with little loss in quality.

Quantization replaces high-precision (e.g. 16-bit) weights with smaller integer representations. Dettmers et al. (2022) showed with LLM.int8() that 8-bit matrix multiplication can run transformer inference at half the memory while preserving full-precision accuracy, by carefully handling rare high-magnitude features.

Quantization is what lets large models run on smaller GPUs and consumer hardware, and it pairs naturally with adapter methods like LoRA.

References

Primary, peer-reviewed and archival sources for this definition.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Dettmers, T., Lewis, M., Belkada, Y., & Zettlemoyer, L. (2022). Advances in Neural Information Processing Systems 35 (NeurIPS 2022).

Source arXiv:2208.07339

Dictionary & encyclopedic entries

Wikipedia — Quantization (signal processing) — neural networks
Hugging Face — Quantization

Cite this entry

MultipleChat. "Quantization." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/quantization

Related terms

LoRA (Low-Rank Adaptation) Latency Knowledge Distillation

Back to the full glossary

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans

Pricing

Quantization

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Related AI guides and next steps

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat