AITopics

Country: North America > United States (0.68)

Genre: Instructional Material > Course Syllabus & Notes (0.46)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Neural Information Processing SystemsOct-1-2025, 23:18:01 GMT

Learning Mean-Field Games

Xin Guo, Anran Hu, Renyuan Xu, Junzi Zhang

This paper presents a general mean-field game (GMFG) framework for simultaneous learning and decision-making in stochastic games with a large population.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country: North America > United States > California (0.28)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Game Theory (0.94)

Neural Information Processing SystemsOct-1-2025, 23:02:00 GMT

06d5ae105ea1bea4d800bc96491876e9-Paper.pdf

artificial intelligence, deep learning, machine learning, (17 more...)

Country: Asia > China (0.28)

Genre: Research Report (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)

Neural Information Processing SystemsOct-1-2025, 22:21:29 GMT

03793ef7d06ffd63d34ade9d091f1ced-Paper.pdf

artificial intelligence, collision avoidance, robot, (12 more...)

Country: North America > United States (0.46)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.74)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Ebi, Daniel, Lambrechts, Gaspard, Ernst, Damien, Böhm, Klemens

Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

arXiv.org Machine LearningOct-1-2025

Reinforcement learning in partially observable environments requires agents to act under uncertainty from noisy, incomplete observations. Asymmetric actor-critic methods leverage privileged information during training to improve learning under these conditions. However, existing approaches typically assume full-state access during training. In this work, we challenge this assumption by proposing a novel actor-critic framework, called informed asymmetric actor-critic, that enables conditioning the critic on arbitrary privileged signals without requiring access to the full state. We show that policy gradients remain unbiased under this formulation, extending the theoretical foundation of asymmetric methods to the more general case of privileged partial information. To quantify the impact of such signals, we propose informativeness measures based on kernel methods and return prediction error, providing practical tools for evaluating training-time signals. We validate our approach empirically on benchmark navigation tasks and synthetic partially observable environments, showing that our informed asymmetric method improves learning efficiency and value estimation when informative privileged inputs are available. Our findings challenge the necessity of full-state access and open new directions for designing asymmetric reinforcement learning methods that are both practical and theoretically sound.

agent, information, policy gradient, (15 more...)

arXiv.org Machine Learning

2509.26

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Europe > Belgium > Wallonia > Liège Province > Liège (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Memory-Efficient 2D/3D Shape Assembly of Robot Swarms

Yue, Shuoyu, Li, Pengpeng, Xu, Yang, Ze, Kunrui, Long, Xingjian, Cao, Huazi, Sun, Guibin

Mean-shift-based approaches have recently emerged as the most effective methods for robot swarm shape assembly tasks. These methods rely on image-based representations of target shapes to compute local density gradients and perform mean-shift exploration, which constitute their core mechanism. However, such image representations incur substantial memory overhead, which can become prohibitive for high-resolution or 3D shapes. To overcome this limitation, we propose a memory-efficient tree map representation that hierarchically encodes user-specified shapes and is applicable to both 2D and 3D scenarios. Building on this representation, we design a behavior-based distributed controller that enables assignment-free shape assembly. Comparative 2D and 3D simulations against a state-of-the-art mean-shift algorithm demonstrate one to two orders of magnitude lower memory usage and two to three times faster shape entry while maintaining comparable uniformity. Finally, we validate the framework through physical experiments with 6 to 7 UAVs, confirming its real-world practicality.

artificial intelligence, node, robot, (17 more...)

2509.26518

Country:

Asia > China (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

SCUBA: Salesforce Computer Use Benchmark

Dai, Yutong, Ramakrishnan, Krithika, Gu, Jing, Fernandez, Matthew, Luo, Yanqi, Prabhu, Viraj, Hu, Zhenyu, Savarese, Silvio, Xiong, Caiming, Chen, Zeyuan, Xu, Ran

We introduce SCUBA, a benchmark designed to evaluate computer-use agents on customer relationship management (CRM) workflows within the Salesforce platform. SCUBA contains 300 task instances derived from real user interviews, spanning three primary personas, platform administrators, sales representatives, and service agents. The tasks test a range of enterprise-critical abilities, including Enterprise Software UI navigation, data manipulation, workflow automation, information retrieval, and troubleshooting. To ensure realism, SCUBA operates in Salesforce sandbox environments with support for parallel execution and fine-grained evaluation metrics to capture milestone progress. We benchmark a diverse set of agents under both zero-shot and demonstration-augmented settings. We observed huge performance gaps in different agent design paradigms and gaps between the open-source model and the closed-source model. In the zero-shot setting, open-source model powered computer-use agents that have strong performance on related benchmarks like OSWorld only have less than 5\% success rate on SCUBA, while methods built on closed-source models can still have up to 39% task success rate. In the demonstration-augmented settings, task success rates can be improved to 50\% while simultaneously reducing time and costs by 13% and 16%, respectively. These findings highlight both the challenges of enterprise tasks automation and the promise of agentic solutions. By offering a realistic benchmark with interpretable evaluation, SCUBA aims to accelerate progress in building reliable computer-use agents for complex business software ecosystems.

large language model, machine learning, natural language, (19 more...)

2509.26506

Genre:

Research Report (0.64)
Workflow (0.50)

Industry: Information Technology > Software (1.00)

Technology:

Information Technology > Enterprise Applications > Customer Relationship Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development

Yang, Runxin, Wan, Yuxuan, Li, Shuqing, Lyu, Michael R.

Developing 3D games requires specialized expertise across multiple domains, including programming, 3D modeling, and engine configuration, which limits access to millions of potential creators. Recently, researchers have begun to explore automated game development. However, existing approaches face three primary challenges: (1) limited scope to 2D content generation or isolated code snippets; (2) requirement for manual integration of generated components into game engines; and (3) poor performance on handling interactive game logic and state management. While Multimodal Large Language Models (MLLMs) demonstrate potential capabilities to ease the game generation task, a critical gap still remains in translating these outputs into production-ready, executable game projects based on game engines such as Unity and Unreal Engine. To bridge the gap, this paper introduces UniGen, the first end-to-end coordinated multi-agent framework that automates zero-coding development of runnable 3D games from natural language requirements. Specifically, UniGen uses a Planning Agent that interprets user requirements into structured blueprints and engineered logic descriptions; after which a Generation Agent produces executable C# scripts; then an Automation Agent handles engine-specific component binding and scene construction; and lastly a Debugging Agent provides real-time error correction through conversational interaction. We evaluated UniGen on three distinct game prototypes. Results demonstrate that UniGen not only democratizes game creation by requiring no coding from the user, but also reduces development time by 91.4%. We release UniGen at https://github.com/yxwan123/UniGen. A video demonstration is available at https://www.youtube.com/watch?v=xyJjFfnxUx0.

game development, large language model, machine learning, (20 more...)

2509.26161

Country: South America > Brazil > Rio de Janeiro (0.16)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs

Wang, Yixu, Wang, Xin, Yao, Yang, Li, Xinyuan, Teng, Yan, Ma, Xingjun, Wang, Yingchun

The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliable safety and compliance evaluation. However, existing static benchmarks are ill-equipped to address the dynamic nature of AI risks and evolving regulations, creating a critical safety gap. This paper introduces a new paradigm of agentic safety evaluation, reframing evaluation as a continuous and self-evolving process rather than a one-time audit. We then propose a novel multi-agent framework SafeEvalAgent, which autonomously ingests unstructured policy documents to generate and perpetually evolve a comprehensive safety benchmark. SafeEvalAgent leverages a synergistic pipeline of specialized agents and incorporates a Self-evolving Evaluation loop, where the system learns from evaluation results to craft progressively more sophisticated and targeted test cases. Our experiments demonstrate the effectiveness of SafeEvalAgent, showing a consistent decline in model safety as the evaluation hardens. For instance, GPT-5's safety rate on the EU AI Act drops from 72.50% to 36.36% over successive iterations. These findings reveal the limitations of static assessments and highlight our framework's ability to uncover deep vulnerabilities missed by traditional methods, underscoring the urgent need for dynamic evaluation ecosystems to ensure the safe and responsible deployment of advanced AI.

large language model, machine learning, natural language, (18 more...)