Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF

Open in new window