Batch Active Learning of Reward Functions from Human Preferences

Bıyık, Erdem, Anari, Nima, Sadigh, Dorsa

Feb-24-2024–arXiv.org Machine Learning

Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data at the expense of parallelization and computation time. In this paper, we develop a set of novel algorithms, batch active preference-based learning methods, that enable efficient learning of reward functions using as few data samples as possible while still having short query generation times and also retaining parallelizability. We introduce a method based on determinantal point processes (DPP) for active batch generation and several heuristic-based alternatives. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We showcase one of our algorithms in a study to learn human users' preferences.

artificial intelligence, machine learning, optimization problem, (12 more...)

arXiv.org Machine Learning

Feb-24-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California
  - Los Angeles County > Los Angeles (0.14)
  - Santa Clara County (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (0.46)
- Transportation (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (1.00)
  - Representation & Reasoning
    - Optimization (0.67)
    - Search (0.68)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found