AITopics | generalist agent

Collaborating Authors

generalist agent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diversity Is Not All You Need: Training A Robust Cooperative Agent Needs Specialist Partners

Neural Information Processing SystemsMar-20-2026, 23:28:10 GMT

Partner diversity is known to be crucial for training a robust generalist cooperative agent. In this paper, we show that partner specialization, in addition to diversity, is crucial for the robustness of a downstream generalist agent. We propose a principled method for quantifying both the diversity and specialization of a partner population based on the concept of mutual information. Then, we observe that the recently proposed cross-play minimization (XP-min) technique produces diverse and specialized partners. However, the generated partners are overfit, reducing their usefulness as training partners. To address this, we propose simple methods, based on reinforcement learning and supervised learning, for extracting the diverse and specialized behaviors of XP-min generated partners but not their overfitness. We demonstrate empirically that the proposed method effectively removes overfitness, and extracted populations produce more robust generalist agents compared to the source XP-min populations.

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

b2cac94f82928a85055987d9fd44753f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 10:21:47 GMT

agent, arxiv preprint arxiv, learning, (10 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Diversity Is Not All You Need: Training A Robust Cooperative Agent Needs Specialist Partners

Neural Information Processing SystemsFeb-15-2026, 12:44:46 GMT

In this paper, we show that partner specialization, in addition to diversity, is crucial for the robustness of a downstream generalist agent.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

5950bf290a1570ea401bf98882128160-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-12-2026, 04:12:26 GMT

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(13 more...)

Genre:

Research Report (0.68)
Workflow (0.46)

Industry:

Government (0.67)
Transportation > Passenger (0.46)
Transportation > Air (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Mind2Web: Towards a Generalist Agent for the Web

Neural Information Processing SystemsDec-25-2025, 10:32:55 GMT

We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use of real-world websites instead of simulated and simplified ones, and 3) a broad spectrum of user interaction patterns. Based on Mind2Web, we conduct an initial exploration of using large language models (LLMs) for building generalist web agents. While the raw HTML of real-world websites are often too large to be fed to LLMs, we show that first filtering it with a small LM significantly improves the effectiveness and efficiency of LLMs. Our solution demonstrates a decent level of performance, even on websites or entire domains the model has never seen before, but there is still a substantial room to improve towards truly generalizable agents.

generalist agent, mind2web, name change, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production

Shlomov, Segev, Oved, Alon, Marreed, Sami, Levy, Ido, Akrabi, Offer, Yaeli, Avi, Strąk, Łukasz, Koumpan, Elizabeth, Goldshtein, Yinon, Shapira, Eilam, Mashkif, Nir, Adi, Asaf

arXiv.org Artificial IntelligenceDec-10-2025

Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development, and the absence of standardized evaluation practices. Generalist agents have emerged as a promising direction, excelling on academic benchmarks and offering flexibility across task types, applications, and modalities. Yet, evidence of their use in production enterprise settings remains limited. This paper reports IBM's experience developing and piloting the Computer Using Generalist Agent (CUGA), which has been open-sourced for the community (https://github.com/cuga-project/cuga-agent). CUGA adopts a hierarchical planner--executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and WebArena. Beyond benchmarks, it was evaluated in a pilot within the Business-Process-Outsourcing talent acquisition domain, addressing enterprise requirements for scalability, auditability, safety, and governance. To support assessment, we introduce BPO-TA, a 26-task benchmark spanning 13 analytics endpoints. In preliminary evaluations, CUGA approached the accuracy of specialized agents while indicating potential for reducing development time and cost. Our contribution is twofold: presenting early evidence of generalist agents operating at enterprise scale, and distilling technical and organizational lessons from this initial pilot. We outline requirements and next steps for advancing research-grade architectures like CUGA into robust, enterprise-ready systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.23856

Genre: Research Report (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents

Wang, Zihao, Li, Xujing, Ye, Yining, Fang, Junjie, Wang, Haoming, Liu, Longxiang, Liang, Shihao, Lu, Junting, Wu, Zhiyong, Feng, Jiazhan, Zhong, Wanjun, Li, Zili, Wang, Yu, Miao, Yu, Zhou, Bo, Li, Yuanfan, Wang, Hao, Zhao, Zhongkai, Wu, Faming, Jiang, Zhengxuan, Tan, Weihao, Yao, Heyuan, Yan, Shi, Li, Xiangyang, Liang, Yitao, Qin, Yujia, Shi, Guang

arXiv.org Artificial IntelligenceOct-29-2025

We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned native keyboard-mouse inputs. Unlike API- or GUI-based approaches, this paradigm enables large-scale continual pre-training across heterogeneous domains, including OS, web, and simulation games. Game-TARS is pre-trained on over 500B tokens with diverse trajectories and multimodal data. Key techniques include a decaying continual loss to reduce causal confusion and an efficient Sparse-Thinking strategy that balances reasoning depth and inference cost. Experiments show that Game-TARS achieves about 2 times the success rate over the previous sota model on open-world Minecraft tasks, is close to the generality of fresh humans in unseen web 3d games, and outperforms GPT-5, Gemini-2.5-Pro, and Claude-4-Sonnet in FPS benchmarks. Scaling results on training-time and test-time confirm that the unified action space sustains improvements when scaled to cross-game and multimodal data. Our results demonstrate that simple, scalable action representations combined with large-scale pre-training provide a promising path toward generalist agents with broad computer-use abilities.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.23691

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: