Training a Generally Curious Agent

Tajwar, Fahim, Jiang, Yiding, Thankaraj, Abitha, Rahman, Sumaita Sadia, Kolter, J Zico, Schneider, Jeff, Salakhutdinov, Ruslan

Mar-5-2025–arXiv.org Artificial Intelligence

Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present PAPRIKA, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, PAPRIKA teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with PAPRIKA can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful interaction data instead of model updates. To improve sample efficiency, we propose a curriculum learning strategy that prioritizes sampling trajectories from tasks with high learning potential. These results suggest a promising path towards AI systems that can autonomously solve novel sequential decision-making problems that require interactions with the external world.

large language model, machine learning, reinforcement learning, (23 more...)

arXiv.org Artificial Intelligence

Mar-5-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- Europe > Ukraine
  - Kyiv Oblast > Kyiv (0.14)
- North America > United States (0.92)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (0.67)
- Energy > Oil & Gas
  - Upstream (0.45)
- Government > Military
  - Navy (0.47)
- Leisure & Entertainment > Games
  - Computer Games (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.45)
    - Neural Networks > Deep Learning (1.00)
    - Reinforcement Learning (0.88)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
  - Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found