EPO: Hierarchical LLM Agents with Environment Preference Optimization

Zhao, Qi, Fu, Haotian, Sun, Chen, Konidaris, George

Aug-28-2024–arXiv.org Artificial Intelligence

Long-horizon decision-making tasks present significant challenges for LLM-based agents due to the need for extensive planning over multiple steps. In this paper, we propose a hierarchical framework that decomposes complex tasks into manageable subgoals, utilizing separate LLMs for subgoal prediction and low-level action generation. To address the challenge of creating training signals for unannotated datasets, we develop a reward model that leverages multimodal environment feedback to automatically generate reward signals. We introduce Environment Preference Optimization (EPO), a novel method that generates preference signals from the environment's feedback and uses them to train LLM-based agents. Extensive experiments on ALFRED demonstrate the state-of-the-art performance of our framework, achieving first place on the ALFRED public leaderboard and showcasing its potential to improve long-horizon decision-making in diverse environments.

agent, dataset, module, (13 more...)

arXiv.org Artificial Intelligence

Aug-28-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Montserrat (0.04)
  - United States
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Asia > South Korea
  - Daegu > Daegu (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found