🛡️

Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.

Quick verification

Please confirm you're human to continue.

TrainingAlignment Updated 2026

RLHF (Reinforcement Learning from Human Feedback)

A training method that uses human preference ratings to teach a model which responses people actually prefer, making it more helpful and better aligned.

RLHF fits a reward model to human comparisons of model outputs, then optimises the language model against that reward with reinforcement learning. Christiano et al. (2017) introduced learning from human preferences; Stiennon et al. (2020) applied it to summarisation; and Ouyang et al. (2022) used it to build InstructGPT, the recipe behind today's instruction-following assistants.

RLHF is much of what separates a raw next-token predictor from a model that feels helpful, honest and safe to talk to.

References

Primary, peer-reviewed and archival sources for this definition.

Deep Reinforcement Learning from Human Preferences

Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Advances in Neural Information Processing Systems 30 (NeurIPS 2017).

Source arXiv:1706.03741

Learning to summarize from human feedback

Stiennon, N., Ouyang, L., Wu, J., Ziegler, D. M., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. (2020). Advances in Neural Information Processing Systems 33 (NeurIPS 2020).

Source arXiv:2009.01325

Training language models to follow instructions with human feedback

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., et al. (2022). Advances in Neural Information Processing Systems 35 (NeurIPS 2022).

Source arXiv:2203.02155

Dictionary & encyclopedic entries

Wikipedia — Reinforcement learning from human feedback
IBM — Think / Topics — What is RLHF?

Cite this entry

MultipleChat. "RLHF (Reinforcement Learning from Human Feedback)." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/rlhf

Related terms

Alignment Fine-Tuning LLM (Large Language Model)

Back to the full glossary

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans

Pricing

RLHF (Reinforcement Learning from Human Feedback)

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat

References

Dictionary & encyclopedic entries

Cite this entry

Related terms

See this in practice

Related AI guides and next steps

Compare MultipleChat plans

Compare AI models side by side

Which AI should I use?

Use ChatGPT, Claude and Gemini together

Multi-model AI platform

What is multi-model AI?

AI model comparison tool

AI productivity toolkit 2026

Free AI tools from MultipleChat