Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

Open in new window