RLHF — Reinforcement Learning from Human Feedback
Definition: A training technique where humans rank or score AI outputs, and that feedback is used to fine-tune the model. RLHF is what made ChatGPT useful rather than just smart.
Example
OpenAI used thousands of human raters to train GPT models to follow instructions and refuse harmful requests via RLHF.
When you'll hear it
RLHF shows up most often in AI strategy reviews, model evaluation discussions, and product roadmap meetings. When someone uses it, they're usually referring to reinforcement learning from human feedback — and they expect the room to already know what that means.
FAQs
What does RLHF stand for?
RLHF stands for Reinforcement Learning from Human Feedback.
What does RLHF mean in AI and machine-learning?
A training technique where humans rank or score AI outputs, and that feedback is used to fine-tune the model. RLHF is what made ChatGPT useful rather than just smart.
Where will I hear RLHF used at work?
RLHF comes up most often in AI strategy reviews, model evaluation discussions, and product roadmap meetings. It's used as shorthand for reinforcement learning from human feedback, so people assume you already know the term.