RLHF

/dictionary/rlhf

Definition

The training stage where a pre-trained LLM is tuned with human preferences (people rank the models outputs, the model learns to produce the ones humans prefer). Turns a raw text predictor into a useful assistant.

Posts that use this term

From models to LLMs
An LLM is one kind of ML model, trained on text, predicts the next token. That single trick at scale gets you ChatGPT, and also explains where it breaks.