🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


SafetySecurity Updated 2026

Red Teaming

Deliberately probing a model with adversarial inputs to find harmful, unsafe or policy-violating behaviour before deployment.

Red teaming stress-tests a model by trying to make it fail — eliciting unsafe, biased or disallowed outputs so they can be fixed before release. Perez et al. (2022) showed this can be partly automated, using one language model to generate adversarial test cases against another and surfacing tens of thousands of failure cases at scale.

Red teaming complements jailbreak and prompt-injection research and is now a standard part of responsible model release.

References

Primary, peer-reviewed and archival sources for this definition.

Red Teaming Language Models with Language Models
Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Aslanides, J., Glaese, A., McAleese, N., & Irving, G. (2022). Proceedings of EMNLP 2022.

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Red Teaming." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/red-teaming

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans