DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

Novoseller, Ellen, Goecks, Vinicius G., Watkins, David, Miller, Josh, Waytowich, Nicholas

Jul-22-2023–arXiv.org Artificial Intelligence

In machine learning for sequential decision-making, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal. However, in many unstructured real-world settings, such a reward signal is unknown and humans cannot reliably craft a reward signal that correctly captures desired behavior. To solve tasks in such unstructured and open-ended environments, we present Demonstration-Inferred Preference Reinforcement Learning (DIP-RL), an algorithm that leverages human demonstrations in three distinct ways, including training an autoencoder, seeding reinforcement learning (RL) training batches with demonstration data, and inferring preferences over behaviors to learn a reward function to guide RL. We evaluate DIP-RL in a tree-chopping task in Minecraft. Results suggest that the method can guide an RL agent to learn a reward function that reflects human preferences and that DIP-RL performs competitively relative to baselines. DIP-RL is inspired by our previous work on combining demonstrations and pairwise preferences in Minecraft, which was awarded a research prize at the 2022 NeurIPS MineRL BASALT competition, Learning from Human Feedback in Minecraft. Example trajectory rollouts of DIP-RL and baselines are located at https://sites.google.com/view/dip-rl.

demonstration, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Jul-22-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Hawaii (0.14)

Genre:
- Personal > Honors
  - Award (0.54)
- Research Report > New Finding (0.34)

Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found