CIFR
Direct Preference Optimization: Your Language Model is Secretly a Reward Model — Research Agent | CIFR