Reinforcement Learning from Human Feedback (RLHF)
RLHF is a training method in which human feedback on model responses is used to gradually adjust an AI system so it produces more helpful, safer, or more desired answers. It is an important component of modern chat models applied after the initial pre-training phase.