Deep Bayesian Active Learning for Preference Modeling in Large Language Models
–Neural Information Processing Systems
We address this by proposing the B ayesian A ctive L earner for P reference M odeling (BAL-PM), a novel stochastic acquisition policy that not only targets points of high epistemic uncertainty according to the preference model but also seeks to maximize the entropy of the acquired prompt distribution in the feature space spanned by the employed LLM.
Neural Information Processing Systems
Oct-10-2025, 17:54:55 GMT
- Country:
- North America
- Puerto Rico (0.04)
- United States
- California (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- United Kingdom > England
- Asia
- North America
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Health & Medicine (0.67)
- Energy (0.45)
- Technology: