Bayesian Learning
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
We address this by proposing the B ayesian A ctive L earner for P reference M odeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM.