Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Neural Information Processing Systems 

We address this by proposing the B ayesian A ctive L earner for P reference M odeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found