Group Robust Preference Optimization in Reward-free RLHF

Neural Information Processing Systems 

While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches

Similar Docs  Excel Report  more

TitleSimilaritySource
None found