Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Neural Information Processing Systems 

Project lead, main contributor, correspondence to alexandre.rame@isir.upmc.fr. Equal experimental contribution, order determined at random. Further information and resources related to this project can be found on this website.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found