🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


Evaluation Updated 2026

BLEU

A long-standing automatic metric for machine translation that scores output by how much its word sequences overlap with human reference translations.

BLEU, introduced by Papineni et al. (2002), compares a machine translation against one or more human references using overlapping word n-grams, with a penalty for output that is too short. It was the first automatic metric to correlate reasonably with human judgement of translation quality and became a field standard.

BLEU is fast and reproducible but blind to meaning and paraphrase, so it is used alongside, not instead of, human evaluation.

References

Primary, peer-reviewed and archival sources for this definition.

BLEU: a Method for Automatic Evaluation of Machine Translation
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Proceedings of ACL 2002, pp. 311–318.

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "BLEU." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/bleu

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans