AITopics | adac

Collaborating Authors

adac

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach

Chen, Xuyang, Yan, Keyu, Cao, Wenhan, Zhao, Lin

arXiv.org Artificial IntelligenceOct-7-2025

Offline reinforcement learning (RL) learns policies from fixed datasets without online interactions, but suffers from distribution shift, causing inaccurate evaluation and overestimation of out-of-distribution (OOD) actions. Existing methods counter this by conservatively discouraging all OOD actions, which limits generalization. We propose Advantage-based Diffusion Actor-Critic (ADAC), which evaluates OOD actions via an advantage-like function and uses it to modulate the Q-function update discriminatively. Our key insight is that the (state) value function is generally learned more reliably than the action-value function; we thus use the next-state value to indirectly assess each action. We develop a PointMaze environment to clearly visualize that advantage modulation effectively selects superior OOD actions while discouraging inferior ones. Moreover, extensive experiments on the D4RL benchmark show that ADAC achieves state-of-the-art performance, with especially strong gains on challenging tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2505.05126

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Online Control for Linear Dynamics: A Data-Driven Approach

Liu, Zishun, Chen, Yongxin

arXiv.org Artificial IntelligenceAug-16-2023

This paper considers an online control problem over a linear time-invariant system with unknown dynamics, bounded disturbance, and adversarial cost. We propose a data-driven strategy to reduce the regret of the controller. Unlike model-based methods, our algorithm does not identify the system model, instead, it leverages a single noise-free trajectory to calculate the accumulation of disturbance and makes decisions using the accumulated disturbance action controller we design, whose parameters are updated by online gradient descent. We prove that the regret of our algorithm is $\mathcal{O}(\sqrt{T})$ under mild assumptions, suggesting that its performance is on par with model-based methods.

artificial intelligence, machine learning, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2308.08138

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration

Liu, Anji, Liang, Yitao, Broeck, Guy Van den

arXiv.org Machine LearningFeb-27-2020

Off-policy reinforcement learning (RL) is concerned with learning a rewarding policy by executing another policy that gathers samples of experience. While the former policy (i.e. target policy) is rewarding but in-expressive (in most cases, deterministic), doing well in the latter task, in contrast, requires an expressive policy (i.e. behavior policy) that offers guided and effective exploration. Contrary to most methods that make a trade-off between optimality and expressiveness, disentangled frameworks explicitly decouple the two objectives, which each is dealt with by a distinct separate policy. Although being able to freely design and optimize the two policies with respect to their own objectives, naively disentangling them can lead to inefficient learning or stability issues. To mitigate this problem, our proposed method Analogous Disentangled Actor-Critic (ADAC) designs analogous pairs of actors and critics. Specifically, ADAC leverages a key property about Stein variational gradient descent (SVGD) to constraint the expressive energy-based behavior policy with respect to the target one for effective exploration. Additionally, an analogous critic pair is introduced to incorporate intrinsic rewards in a principled manner, with theoretical guarantees on the overall learning stability and effectiveness. We empirically evaluate environment-reward-only ADAC on 14 continuous-control tasks and report the state-of-the-art on 10 of them. We further demonstrate ADAC, when paired with intrinsic rewards, outperform alternatives in exploration-challenging tasks.

adac, artificial intelligence, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

2002.10738

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback