🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


TrainingEfficiency Updated 2026

Knowledge Distillation

Training a smaller "student" model to imitate a larger "teacher", transferring much of its capability into a cheaper, faster model.

Hinton et al. (2015) showed that a compact model can be trained on the soft probability outputs of a larger model or ensemble, learning from the teacher's full distribution rather than just hard labels. The student captures much of the teacher's behaviour at a fraction of the size and cost.

Distillation is a standard way to ship small, fast versions of large models for latency- or cost-sensitive deployment.

References

Primary, peer-reviewed and archival sources for this definition.

Distilling the Knowledge in a Neural Network
Hinton, G., Vinyals, O., & Dean, J. (2015). NeurIPS 2014 Deep Learning Workshop (arXiv preprint).

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Knowledge Distillation." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/distillation

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans