AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench

Toledo, Edan, Hambardzumyan, Karen, Josifoski, Martin, Hazra, Rishi, Baldwin, Nicolas, Audran-Reiss, Alexis, Kuchnik, Michael, Magka, Despoina, Jiang, Minqi, Lupidi, Alisia Maria, Lupu, Andrei, Raileanu, Roberta, Niu, Kelvin, Shavrina, Tatiana, Gagnon-Audet, Jean-Christophe, Shvartsman, Michael, Sodhani, Shagun, Miller, Alexander H., Charnalia, Abhishek, Dunfield, Derek, Wu, Carole-Jean, Stenetorp, Pontus, Cancedda, Nicola, Foerster, Jakob Nicolaus, Bachrach, Yoram

Nov-5-2025–arXiv.org Artificial Intelligence

AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.

evolutionary algorithm, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Nov-5-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (1.00)
- Health & Medicine > Therapeutic Area
  - Pulmonary/Respiratory Diseases (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Search (1.00)
  - Cognitive Science > Problem Solving (0.93)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.94)
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Evolutionary Systems (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found