AITopics

Deep reinforcement learning (DRL) has effectively enhanced gameplay experiences and game design across various game genres. However, few studies on fighting game agents have focused explicitly on enhancing player enjoyment, a critical factor for both developers and players. To address this gap and establish a practical baseline for designing enjoyability-focused agents, we propose a two-tier agent (TTA) system and conducted experiments in the classic fighting game Street Fighter II. The first tier of TTA employs a task-oriented network architecture, modularized reward functions, and hybrid training to produce diverse and skilled DRL agents. In the second tier of TTA, a Large Language Model Hyper-Agent, leveraging players' playing data and feedback, dynamically selects suitable DRL opponents. In addition, we investigate and model several key factors that affect the enjoyability of the opponent. The experiments demonstrate improvements from 64. 36% to 156. 36% in the execution of advanced skills over baseline methods. The trained agents also exhibit distinct game-playing styles. Additionally, we conducted a small-scale user study, and the overall enjoyment in the player's feedback validates the effectiveness of our TTA system.

large language model, machine learning, natural language, (21 more...)

2504.07425

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Modeling Response Consistency in Multi-Agent LLM Systems: A Comparative Analysis of Shared and Separate Context Approaches

Helmi, Tooraj

Large Language Models (LLMs) are increasingly utilized in multi-agent systems (MAS) to enhance collaborative problem-solving and interactive reasoning. Recent advancements have enabled LLMs to function as autonomous agents capable of understanding complex interactions across multiple topics. However, deploying LLMs in MAS introduces challenges related to context management, response consistency, and scalability, especially when agents must operate under memory limitations and handle noisy inputs. While prior research has explored optimizing context sharing and response latency in LLM-driven MAS, these efforts often focus on either fully centralized or decentralized configurations, each with distinct trade-offs. In this paper, we develop a probabilistic framework to analyze the impact of shared versus separate context configurations on response consistency and response times in LLM-based MAS. We introduce the Response Consistency Index (RCI) as a metric to evaluate the effects of context limitations, noise, and inter-agent dependencies on system performance. Our approach differs from existing research by focusing on the interplay between memory constraints and noise management, providing insights into optimizing scalability and response times in environments with interdependent topics. Through this analysis, we offer a comprehensive understanding of how different configurations impact the efficiency of LLM-driven multi-agent systems, thereby guiding the design of more robust architectures.

artificial intelligence, large language model, natural language, (15 more...)

2504.07303

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Pires, Diogo L., Mancuso, Vincenzo, Castagno, Paolo, Marsan, Marco Ajmone

Self-organisation of common good usage and an application to Internet services

Natural and human-made common goods present key challenges due to their susceptibility to degradation, overuse, or congestion. We explore the self-organisation of their usage when individuals have access to several available commons but limited information on them. We propose an extension of the Win-Stay, Lose-Shift (WSLS) strategy for such systems, under which individuals use a resource iteratively until they are unsuccessful and then shift randomly. This simple strategy leads to a distribution of the use of commons with an improvement against random shifting. Selective individuals who retain information on their usage and accordingly adapt their tolerance to failure in each common good improve the average experienced quality for an entire population. Hybrid systems of selective and non-selective individuals can lead to an equilibrium with equalised experienced quality akin to the ideal free distribution. We show that these results can be applied to the server selection problem faced by mobile users accessing Internet services and we perform realistic simulations to test their validity. Furthermore, these findings can be used to understand other real systems such as animal dispersal on grazing and foraging land, and to propose solutions to operators of systems of public transport or other technological commons.

artificial intelligence, machine learning, probability, (19 more...)

2504.07175

Country:

Europe > Italy (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.67)
Telecommunications (0.67)
Information Technology > Networks (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Jain, Naman, Singh, Jaskirat, Shetty, Manish, Zheng, Liang, Sen, Koushik, Stoica, Ion

Improving open-source models on real-world SWE tasks (solving GITHUB issues) faces two key challenges: 1) scalable curation of execution environments to train these models, and, 2) optimal scaling of test-time compute. We introduce AgentGym, the largest procedurally-curated executable gym environment for training real-world SWE-agents, consisting of more than 8.7K tasks. AgentGym is powered by two main contributions: 1) SYNGEN: a synthetic data curation recipe that enables scalable curation of executable environments using test-generation and back-translation directly from commits, thereby reducing reliance on human-written issues or unit tests. We show that this enables more scalable training leading to pass@1 performance of 34.4% on SWE-Bench Verified benchmark with our 32B model. 2) Hybrid Test-time Scaling: we provide an in-depth analysis of two test-time scaling axes; execution-based and execution-free verifiers, demonstrating that they exhibit complementary strengths and limitations. Test-based verifiers suffer from low distinguishability, while execution-free verifiers are biased and often rely on stylistic features. Surprisingly, we find that while each approach individually saturates around 42-43%, significantly higher gains can be obtained by leveraging their complementary strengths. Overall, our approach achieves 51% on the SWE-Bench Verified benchmark, reflecting a new state-of-the-art for open-weight SWE-agents and for the first time showing competitive performance with proprietary models such as o1, o1-preview and sonnet-3.5-v2 (with tools). We will open-source our environments, models, and agent trajectories.

large language model, machine learning, trajectory, (17 more...)

2504.07164

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Software Engineering (0.68)
(2 more...)

Fernando, Chrisantha, Banarse, Dylan, Osindero, Simon

Wanting to be Understood

This paper explores an intrinsic motivation for mutual awareness, hypothesizing that humans possess a fundamental drive to understand and to be understood even in the absence of extrinsic rewards. Through simulations of the perceptual crossing paradigm, we explore the effect of various internal reward functions in reinforcement learning agents. The drive to understand is implemented as an active inference type artificial curiosity reward, whereas the drive to be understood is implemented through intrinsic rewards for imitation, influence/impressionability, and sub-reaction time anticipation of the other. Results indicate that while artificial curiosity alone does not lead to a preference for social interaction, rewards emphasizing reciprocal understanding successfully drive agents to prioritize interaction. We demonstrate that this intrinsic motivation can facilitate cooperation in tasks where only one agent receives extrinsic reward for the behaviour of the other.

machine learning, natural language, reinforcement learning, (21 more...)

2504.06611

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

van der Sar, Erica, Zocca, Alessandro, Bhulai, Sandjai

Optimizing Power Grid Topologies with Reinforcement Learning: A Survey of Methods and Challenges

arXiv.org Machine LearningApr-10-2025

Power grid operation is becoming increasingly complex due to the rising integration of renewable energy sources and the need for more adaptive control strategies. Reinforcement Learning (RL) has emerged as a promising approach to power network control (PNC), offering the potential to enhance decision-making in dynamic and uncertain environments. The Learning To Run a Power Network (L2RPN) competitions have played a key role in accelerating research by providing standardized benchmarks and problem formulations, leading to rapid advancements in RL-based methods. This survey provides a comprehensive and structured overview of RL applications for power grid topology optimization, categorizing existing techniques, highlighting key design choices, and identifying gaps in current research. Additionally, we present a comparative numerical study evaluating the impact of commonly applied RL-based methods, offering insights into their practical effectiveness. By consolidating existing research and outlining open challenges, this survey aims to provide a foundation for future advancements in RL-driven power grid optimization.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2504.0821

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Energy > Renewable (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

Zheng, Boyuan, Fatemi, Michael Y., Jin, Xiaolong, Wang, Zora Zhiruo, Gandhi, Apurva, Song, Yueqi, Gu, Yu, Srinivasa, Jayanth, Liu, Gaowen, Neubig, Graham, Su, Yu

To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural knowledge abstraction, refining skills, and skill composition. In this work, we introduce SkillWeaver, a skill-centric framework enabling agents to self-improve by autonomously synthesizing reusable skills as APIs. Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs. Iterative exploration continually expands a library of lightweight, plug-and-play APIs, significantly enhancing the agent's capabilities. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively. Additionally, APIs synthesized by strong agents substantially enhance weaker agents through transferable skills, yielding improvements of up to 54.3% on WebArena. These results demonstrate the effectiveness of honing diverse website interactions into APIs, which can be seamlessly shared among various web agents.

artificial intelligence, machine learning, natural language, (19 more...)

2504.07079

Country:

North America > United States (1.00)
Asia (0.67)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(3 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Ahsini, Yusef, Reverte, Belén, Conejero, J. Alberto

AI-Driven Consensus: Modeling Multi-Agent Networks with Long-Range Interactions through path-Laplacian Matrices

Extended connectivity in graphs can be analyzed through k-path Laplacian matrices, which permit the capture of long-range interactions in various real-world networked systems such as social, transportation, and multi-agent networks. In this work, we present several alternative methods based on machine learning methods (LSTM, xLSTM, Transformer, XGBoost, and ConvLSTM) to predict the final consensus value based on directed networks (Erdös-Renyi, Watts-Strogatz, and Barabási-Albert) and on the initial state. We highlight how different k-hop interactions affect the performance of the tested methods. This framework opens new avenues for analyzing multi-scale diffusion processes in large-scale, complex networks.

artificial intelligence, machine learning, matrix, (17 more...)

2504.06894

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Vázquez, Gricel, Evangelidis, Alexandros, Shahbeigi, Sepeedeh, Gerasimou, Simos

Adaptive Human-Robot Collaborative Missions using Hybrid Task Planning

Producing robust task plans in human-robot collaborative missions is a critical activity in order to increase the likelihood of these missions completing successfully. Despite the broad research body in the area, which considers different classes of constraints and uncertainties, its applicability is confined to relatively simple problems that can be comfortably addressed by the underpinning mathematically-based or heuristic-driven solver engines. In this paper, we introduce a hybrid approach that effectively solves the task planning problem by decomposing it into two intertwined parts, starting with the identification of a feasible plan and followed by its uncertainty augmentation and verification yielding a set of Pareto optimal plans. To enhance its robustness, adaptation tactics are devised for the evolving system requirements and agents' capabilities. We demonstrate our approach through an industrial case study involving workers and robots undertaking activities within a vineyard, showcasing the benefits of our hybrid approach both in the generation of feasible solutions and scalability compared to native planners.

agent, artificial intelligence, planning & scheduling, (18 more...)

2504.06746

Country: Europe (1.00)

Genre: Research Report (0.82)

Industry: Consumer Products & Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.86)

Dynamic Residual Safe Reinforcement Learning for Multi-Agent Safety-Critical Scenarios Decision-Making

Wang, Kaifeng, Chen, Yinsong, Liu, Qi, Li, Xueyuan, Gao, Xin

Their interactions are characterized by significant dynamism and heterogeneity. To address these challenges, we propose a MADCZ modeling approach. By constructing dynamic topological structures and spatiotemporal conflict zones, the model attains precise conflict identification and delivers interpretable decision support. First, a joint state space is established, defined as S = S A Vs S BVs S Peds S Road, (2) where S A Vs, S BVs, S Peds and S Road represent the state subspaces of A Vs, BVs, Peds, and road network, respectively. Each subspace is specifically defined as S V ehs = [ x, y,θ, v,l,c, p ] R 22 S Peds = [ x, y,θ, v,l, c ] R 10 S Road = nullnull G(V,E) | V R n 22, E { 0, 1} n nnull, (3) where x and y denote the horizontal and vertical coordinates of the traffic participants, θ [0, 360) is the heading angle, v represents the longitudinal velocity, l and c represent the lane position and traffic participant type, respectively, each encoded as a three-dimensional one-hot vector. G represents the road network topology, where each traffic participant is modeled as a node v i V, and E represents the connections among participants, representing sensor perception or vehicle-to-vehicle (V2V) communication relationships. Additionally, for vehicles, p denotes the relative motion information with respect to surrounding vehicles, defined as p = [ d j, v j], j = {f, r, lf, lr,rf, rr }, (4) where d j and v j denote the relative longitudinal distance and the relative velocity between vehicles, and f, r, lf, lr, rf, rr represent the neighboring vehicles at the front, rear, left front, left rear, right front, and right rear, respectively. If no neighboring vehicle is detected in a given direction, the relative longitudinal distance is assigned the maximum perception range and the relative velocity is set to zero.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2504.0667

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Infrastructure & Services (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.85)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.70)