AITopics | action selection

Collaborating Authors

action selection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sampling Networks and Aggregate Simulation for Online POMDP Planning

Neural Information Processing SystemsDec-25-2025, 16:58:14 GMT

The paper introduces a new algorithm for planning in partially observable Markov decision processes (POMDP) based on the idea of aggregate simulation. The algorithm uses product distributions to approximate the belief state and shows how to build a representation graph of an approximate action-value function over belief space.

artificial intelligence, machine learning, representation, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Improve Agents without Retraining: Parallel Tree Search with Off-Policy Correction

Neural Information Processing SystemsDec-23-2025, 22:33:42 GMT

Tree Search (TS) is crucial to some of the most influential successes in reinforcement learning. Here, we tackle two major challenges with TS that limit its usability: \textit{distribution shift} and \textit{scalability}. We first discover and analyze a counter-intuitive phenomenon: action selection through TS and a pre-trained value function often leads to lower performance compared to the original pre-trained agent, even when having access to the exact state and reward in future steps. We show this is due to a distribution shift to areas where value estimates are highly inaccurate and analyze this effect using Extreme Value theory. To overcome this problem, we introduce a novel off-policy correction term that accounts for the mismatch between the pre-trained value and its corresponding TS policy by penalizing under-sampled trajectories.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Active Inference with Reusable State-Dependent Value Profiles

Poschl, Jacob

arXiv.org Machine LearningDec-16-2025

Adaptive behavior in volatile environments requires agents to deploy different value-control regimes across latent contexts, but representing separate preferences, policy biases, and action confidence for every situation is intractable. We introduce value profiles: a small set of reusable bundles of value-related parameters--outcome preferences, policy priors, and policy precision--that are assigned to hidden states in the generative model. As posterior beliefs over states evolve trial-by-trial, effective control parameters emerge through belief-weighted mixing, enabling state-conditional strategy recruitment without maintaining independent parameters for each situation. We evaluate this framework in probabilistic reversal learning, comparing static precision, entropy-coupled dynamic precision, and profile-based models using cross-validated log-likelihood and information criteria. Model comparison using AIC favors the profile-based model over simpler alternatives ( 100-point differences), with consistent parameter recovery demonstrating structural identifiability even when context must be inferred from noisy observations. Model-based inference suggests that, in this task, adaptive control operates primarily through policy prior modulation rather than policy precision modulation, with gradual belief-driven profile recruitment confirming state-conditional rather than merely uncertainty-driven control. Overall, reusable value profiles provide a tractable computational account of belief-conditioned value control in volatile environments, providing a reusable, mode-like representational scheme for behavioral flexibility that yields testable signatures of belief-conditioned control.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2512.11829

Country: Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
(2 more...)

Add feedback

Networked Restless Multi-Arm Bandits with Reinforcement Learning

Zhang, Hanmo, Sun, Zenghui, Wang, Kai

arXiv.org Artificial IntelligenceDec-9-2025

Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a $1-\frac{1}{e}$ approximation guarantee in Bellman updates. Lastly, we prove that the approximate Bellman updates are guaranteed to converge by a modified contraction analysis. We experimentally verify these results by developing an efficient Q-learning algorithm tailored to the networked setting. Experimental results on real-world graph data demonstrate that our Q-learning approach outperforms both $k$-step look-ahead and network-blind approaches, highlighting the importance of capturing and leveraging network effects where they exist.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2512.06274

Country:

Asia > India (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Epidemiology (0.68)
Health & Medicine > Public Health (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Real-Time Reinforcement Learning

Simon Ramstedt, Chris Pal

Neural Information Processing SystemsNov-16-2025, 11:49:42 GMT

In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning

Neural Information Processing SystemsNov-16-2025, 07:21:38 GMT

In cooperative multi-agent reinforcement learning, centralized training and decentralized execution (CTDE) has achieved remarkable success.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Supplementary Material A Proofs and Derivations

Neural Information Processing SystemsNov-15-2025, 14:23:09 GMT

We first clarify the behavior of local CGMs (see Def. 1) under interventions. "if": If it holds that "only if": If there is an edge In this section, we give the approximation we use for the KL divergence in Eq. 4. We first state the Note that this term can become negative, whereas the KL is non-negative. With this, the above formulas can be further simplified. The goal of the agent is to move the object to a goal zone. For our experiment in Sec. 5 evaluating the causal influence detection, we need to determine whether Then, the following procedure is repeated for each starting location and action: after resetting the simulator, the end effector is manually moved to one of the starting locations, one of the maximal actions in each dimension (i.e., Last, we also label a state as "agent in control" when there is an actual contact between This dataset only has 3.3% transitions with influence (i.e.

agent, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback