🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


TrainingFoundations Updated 2026

Scaling Laws

Empirical relationships showing how model performance improves predictably with more parameters, data and compute.

Kaplan et al. (2020) found that language-model loss falls as a smooth power law in model size, dataset size and compute — letting researchers forecast performance before training. Hoffmann et al. (2022), the "Chinchilla" paper, refined this, showing most large models were undertrained and that parameters and training tokens should scale roughly together for a given compute budget.

Scaling laws explain why the field has pursued ever-larger models and ever-larger datasets — and how to budget between them.

References

Primary, peer-reviewed and archival sources for this definition.

Scaling Laws for Neural Language Models
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). arXiv preprint (OpenAI).
Training Compute-Optimal Large Language Models (Chinchilla)
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., et al. (2022). Advances in Neural Information Processing Systems 35 (NeurIPS 2022).

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Scaling Laws." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/scaling-laws

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans