Red teaming stress-tests a model by trying to make it fail — eliciting unsafe, biased or disallowed outputs so they can be fixed before release. Perez et al. (2022) showed this can be partly automated, using one language model to generate adversarial test cases against another and surfacing tens of thousands of failure cases at scale.
Red teaming complements jailbreak and prompt-injection research and is now a standard part of responsible model release.