Goto

Collaborating Authors

 South America








Discovering Preference Optimization Algorithms with and for Large Language Models Chris Lu

Neural Information Processing Systems

Typically, preference optimization is approached as an offline supervised learning task using manually crafted convex loss functions. While these methods are based on theoretical insights, they are inherently constrained by human creativity, so the large search space of possible loss functions remains under-explored.