Deep Bayesian Active Learning for Preference Modeling in Large Language Models
–Neural Information Processing Systems
We address this by proposing the B ayesian A ctive L earner for P reference M odeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM.
Neural Information Processing Systems
Oct-10-2025, 17:54:55 GMT
- Country:
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Belgium > Brussels-Capital Region
- North America
- Puerto Rico (0.04)
- United States
- California (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Energy (0.45)
- Health & Medicine (0.67)
- Technology: