AITopics | maxq

Collaborating Authors

maxq

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Zu, Lipeng, Zhou, Hansong, Zhang, Xiaonan

arXiv.org Artificial IntelligenceNov-7-2025

Deep Q-Networks (DQNs) estimate future returns by learning from transitions sampled from a replay buffer. However, the target updates in DQN often rely on next states generated by actions from past, potentially suboptimal, policy. As a result, these states may not provide informative learning signals, causing high variance into the update process. This issue is exacerbated when the sampled transitions are poorly aligned with the agent's current policy. To address this limitation, we propose the Successor-state Aggregation Deep Q-Network (SADQ), which explicitly models environment dynamics using a stochastic transition model. SADQ integrates successor-state distributions into the Q-value estimation process, enabling more stable and policy-aligned value updates. Additionally, it explores a more efficient action selection strategy with the modeled transition structure. We provide theoretical guarantees that SADQ maintains unbiased value estimates while reducing training variance. Our extensive empirical results across standard RL benchmarks and real-world vector-based control tasks demonstrate that SADQ consistently outperforms DQN variants in both stability and learning efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2511.03836

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bayesian Hierarchical Reinforcement Learning

Neural Information Processing SystemsMar-14-2024, 08:59:39 GMT

We define priors on the primitive environment model and on task pseudo-rewards. Since models for composite tasks can be complex, we use a mixed model-based/model-free learning approach to find an optimal hierarchical policy. We show empirically that (i) our approach results in improved convergence over non-Bayesian baselines, (ii) using both task hierarchies and Bayesian priors is better than either alone, (iii) taking advantage of the task hierarchy reduces the computational cost of Bayesian reinforcement learning and (iv) in this framework, task pseudo-rewards can be learned instead of being manually specified, leading to hierarchically optimal rather than recursively optimal policies.

maxq, optimal policy, subtask, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
North America > United States > New York > New York County > New York City (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.97)

Add feedback

Planning with Abstract Markov Decision Processes

AAAI ConferencesJun-14-2017

Robots acting in human-scale environments must plan under uncertainty in large state-action spaces and face constantly changing reward functions as requirements and goals change. Planning under uncertainty in large state-action spaces requires hierarchical abstraction for efficient computation. We introduce a new hierarchical planning framework called Abstract Markov Decision Processes (AMDPs) that can plan in a fraction of the time needed for complex decision making in ordinary MDPs. AMDPs provide abstract states, actions, and transition dynamics in multiple layers above a base-level "flat" MDP . AMDPs decompose problems into a series of subtasks with both local reward and local transition functions used to create policies for subtasks. The resulting hierarchical planning method is independently optimal at each level of abstraction, and is recursively optimal when the local reward and transition functions are correct. We present empirical results showing significantly improved planning speed, while maintaining solution quality, in the Taxi domain and in a mobile-manipulation robotics problem. Furthermore, our approach allows specification of a decision-making model for a mobile-manipulation problem on a Turtlebot, spanning from low-level control actions operating on continuous variables all the way up through high-level object manipulation tasks.

amdp, hierarchy, subgoal, (16 more...)

AAAI Conferences

Twenty-Seventh International Conference on Automated Planning and Scheduling

Country:

North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.05)
North America > United States > Indiana (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Passenger (0.55)
Government > Regional Government > North America Government > United States Government (0.46)
Transportation > Ground > Road (0.37)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.74)

Add feedback

Hierarchical Monte-Carlo Planning

Vien, Ngo Anh (University of Stuttgart) | Toussaint, Marc (University of Stuttgart)

AAAI ConferencesMar-6-2015

Monte-Carlo Tree Search, especially UCT and its POMDP version POMCP, have demonstrated excellent performanceon many problems. However, to efficiently scale to large domains one should also exploit hierarchical structure if present. In such hierarchical domains, finding rewarded states typically requires to search deeply; covering enough such informative states very far from the root becomes computationally expensive in flat non-hierarchical search approaches. We propose novel, scalable MCTS methods which integrate atask hierarchy into the MCTS framework, specifically lead-ing to hierarchical versions of both, UCT and POMCP. The new method does not need to estimate probabilistic models of each subtask, it instead computes subtask policies purely sample-based. We evaluate the hierarchical MCTS methods on various settings such as a hierarchical MDP, a Bayesian model-based hierarchical RL problem, and a large hierarchical POMDP.

artificial intelligence, machine learning, macro action, (19 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Bayesian Hierarchical Reinforcement Learning

Cao, Feng, Ray, Soumya

Neural Information Processing SystemsDec-31-2012

We describe an approach to incorporating Bayesian priors in the maxq framework for hierarchical reinforcement learning (HRL). We define priors on the primitive environment model and on task pseudo-rewards. Since models for composite tasks can be complex, we use a mixed model-based/model-free learning approach to find an optimal hierarchical policy. We show empirically that (i) our approach results in improved convergence over non-Bayesian baselines, given sensible priors, (ii) task hierarchies and Bayesian priors can be complementary sources of information, and using both sources is better than either alone, (iii) taking advantage of the structural decomposition induced by the task hierarchy significantly reduces the computational cost of Bayesian reinforcement learning and (iv) in this framework, task pseudo-rewards can be learned instead of being manually specified, leading to automatic learning of hierarchically optimal rather than recursively optimal policies.

Add feedback