🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


EvaluationFoundations Updated 2026

Perplexity

A standard intrinsic measure of a language model's quality: how surprised it is by held-out text. Lower perplexity means the model predicts real text better.

Perplexity is the exponential of a model's average per-token cross-entropy on a test set — intuitively, the effective number of equally likely choices the model is deciding among at each step. The measure traces to early speech-recognition research (Jelinek et al., 1977) and is treated as a core evaluation metric in Jurafsky & Martin's standard text.

Lower is better, but perplexity only measures predictive fit on text; it does not directly capture helpfulness, factuality or safety, which is why it is paired with task benchmarks and human evaluation.

References

Primary, peer-reviewed and archival sources for this definition.

Perplexity—a measure of the difficulty of speech recognition tasks
Jelinek, F., Mercer, R. L., Bahl, L. R., & Baker, J. K. (1977). Journal of the Acoustical Society of America, 62(S1), S63.
Speech and Language Processing (3rd ed. draft), Ch. 3: N-gram Language Models
Jurafsky, D., & Martin, J. H. (2024). Stanford University.

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Perplexity." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/perplexity

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans