AITopics | Agents

AIhub monthly digest: April 2026 – machine learning for particle physics, AI Index Report, and table tennis

AIHubApr-30-2026, 09:10:38 GMT

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we meet PhD students and early-career researchers, find out how machine learning is used for particle physics discoveries, cast an eye over the latest AI Index Report, and watch a robot beating elite players at table tennis. In an article published in Nature this month, Sony AI introduced Ace, a table tennis robot that has beaten professional players in competitive matches. The system combines event-based vision sensors and a control system based on model-free reinforcement learning, as well as state-of-the-art high-speed robot hardware. The ninth edition of the Artificial Intelligence Index Report was published on 13 April 2026 .

index report, machine learning, reinforcement learning, (14 more...)

AIHub

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.36)

Add feedback

Multi-Agent Learning with Heterogeneous Linear Contextual Bandits

Neural Information Processing SystemsApr-30-2026, 08:52:32 GMT

As trained intelligent systems become increasingly pervasive, multi-agent learning has emerged as a popular framework for studying complex interactions between autonomous agents. Yet, a formal understanding of how and when learners in heterogeneous environments benefit from sharing their respective experiences is still in its infancy. In this paper, we seek answers to these questions in the context of linear contextual bandits. We present a novel distributed learning algorithm based on the upper confidence bound (UCB) algorithm, which we refer to as H-LINUCB, wherein agents cooperatively minimize the group regret under the coordination of a central server. In the setting where the level of heterogeneity or dissimilarity across the environments is known to the agents, we show that H-LINUCB is provably optimal in regimes where the tasks are highly similar or highly dissimilar.

artificial intelligence, bandit, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f7d3cef7ff579f2f903c8f458e730cae-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:38:51 GMT

artificial intelligence, machine learning, subtask, (14 more...)

Neural Information Processing Systems

Country:

Asia > China (0.47)
Europe (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

f57ffe47d0b528fbb97901d16bd4eba2-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 07:58:10 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.93)
Information Technology (0.93)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

edac78c3e300629acfe6cbe9ca88fb84-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 05:24:19 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (0.93)
Overview (0.67)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards

Neural Information Processing SystemsApr-30-2026, 05:09:23 GMT

We study a decentralized multi-agent multi-armed bandit problem in which multiple clients are connected by time dependent random graphs provided by an environment. The reward distributions of each arm vary across clients and rewards are generated independently over time by an environment based on distributions that include both sub-exponential and sub-Gaussian distributions. Each client pulls an arm and communicates with neighbors based on the graph provided by the environment. The goal is to minimize the overall regret of the entire system through collaborations. To this end, we introduce a novel algorithmic framework, which first provides robust simulation methods for generating random graphs using rapidly mixing Markov chains or the random graph model, and then combines an averaging-based consensus approach with a newly proposed weighting technique and the upper confidence bound to deliver a UCB-type solution. Our algorithms account for the randomness in the graphs, removing the conventional doubly stochasticity assumption, and only require the knowledge of the number of clients at initialization. We derive optimal instance-dependent regret upper bounds of order logT in both sub-Gaussian and sub-exponential environments, and a nearly optimal mean-gap independent regret upper bound of order T logT up to a logT factor. Importantly, our regret bounds hold with high probability and capture graph randomness, whereas prior works consider expected regret under assumptions and require more stringent reward distributions.

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)

Add feedback

Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative MARL

Neural Information Processing SystemsApr-30-2026, 04:23:35 GMT

Value decomposition methods have gained popularity in the field of cooperative multi-agent reinforcement learning. However, almost all existing methods follow the principle of Individual Global Max (IGM) or its variants, which limits their problem-solving capabilities. To address this, we propose a dual self-awareness value decomposition framework, inspired by the notion of dual self-awareness in psychology, that entirely rejects the IGM premise. Each agent consists of an ego policy for action selection and an alter ego value function to solve the credit assignment problem. The value function factorization can ignore the IGM assumption by utilizing an explicit search procedure. On the basis of the above, we also suggest a novel anti-ego exploration mechanism to avoid the algorithm becoming stuck in a local optimum. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States > California (0.28)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Add feedback

e197fe307eb3467035f892dc100d570a-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 01:51:56 GMT

In addition to the radar plot, we present the specific numerical values for the prediction and driving performance metrics to provide a more detailed and comprehensive analysis of the system's performance, as demonstrated in Table 1. The static evaluation metrics, ADE and FDE, are trained and validated on the Alignment dataset collected from the SUMMIT simulator. The task-driven evaluation metrics, including safety, efficiency, comfort, and driving performance, are derived from interactive closed-loop scenarios. The process for calculating these metrics is described in Appendix C. Results in Table 1 are used to plot the correlation map between ADE/FDE and driving performance, which surprisingly indicates no strong correlation between static evaluation metrics and real driving performance. Moreover, to ensure the comparability between prediction performance metrics and driving performance metrics in the radar plot, we normalize all metrics to the scale of [0, 1]. B.1 The RVOPlanner The Reciprocal Velocity Obstacle (RVO) planner is developed based on [8], which expands on the concept of velocity obstacles [4] to consider the reactive behaviors of exo-agents.

artificial intelligence, efficiency, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report (0.69)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

e197fe307eb3467035f892dc100d570a-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 01:51:53 GMT

artificial intelligence, machine learning, prediction accuracy, (18 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (0.94)

Industry: Transportation > Ground > Road (0.30)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Finding Safe Zones of Markov Decision Processes Policies

Neural Information Processing SystemsApr-30-2026, 01:18:54 GMT

Given a policy of a Markov Decision Process, we define a SAFEZONE as a subset of states, such that most of the policy's trajectories are confined to this subset. The quality of a SAFEZONE is parameterized by the number of states and the escape probability, i.e., the probability that a random trajectory will leave the subset. SAFEZONES are especially interesting when they have a small number of states and low escape probability. We study the complexity of finding optimal SAFEZONES, and show that in general, the problem is computationally hard. Our main result is a bi-criteria approximation learning algorithm with a factor of almost 2 approximation for both the escape probability and SAFEZONE size, using a polynomial size sample complexity.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Industry: