Aviary: training language agents on challenging scientific tasks

Narayanan, Siddharth, Braza, James D., Griffiths, Ryan-Rhys, Ponnapati, Manu, Bou, Albert, Laurent, Jon, Kabeli, Ori, Wellawatte, Geemi, Cox, Sam, Rodriques, Samuel G., White, Andrew D.

Dec-30-2024–arXiv.org Artificial Intelligence

Language agents [1-4] are AI agents [5] that integrate LLMs [6-8] as core components. LLMs excel at zero-shot generalization [9, 10], providing a notable advantage over traditional AI agents, such as those based on handcrafted rules or reinforcement learning, which often struggle to generalize to new environments [11]. While LLMs can exhibit flawed reasoning and logic when used in isolation [12-14], constructing a language agent by grounding LLMs in an environment with observational feedback can mitigate these issues. Early work on language agents used LLMs to directly output actions in the external environment [15-17], while more recently, language agents have been augmented with internal reasoning [18, 19] and planning [20, 21] procedures, as well as long-term memory storage [22, 23]. An emergent research challenge is to pose a theoretical description of the learning problem solved by language agents [4, 24] and to develop efficient methods to optimize the components of a language agent [24-26]. Here, we define common language agent tasks as language decision processes (LDPs) and frame language agents as stochastic computation graphs [27] that may be trained to solve LDPs. We show that pre-existing agents [18, 19, 21] can be implemented within our stochastic computation graph framework and introduce a simple and extensible software package named LDP that enables modular interchange of environments, agents, and optimizers, simplifying experimentation across a variety of settings. These authors jointly supervise technical work at FutureHouse.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Dec-30-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.46)
- North America > United States (0.46)

Genre:
- Research Report (0.50)

Industry:
- Education (1.00)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area (0.67)
- Transportation > Ground
  - Road (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.68)
    - Neural Networks > Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Agents (1.00)