Discovering Preference Optimization Algorithms with and for Large Language Models Chris Lu

Neural Information Processing Systems 

Typically, preference optimization is approached as an offline supervised learning task using manually crafted convex loss functions. While these methods are based on theoretical insights, they are inherently constrained by human creativity, so the large search space of possible loss functions remains under-explored.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found