AITopics | iql

Collaborating Authors

iql

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ambient Diffusion Guided Recovery for Corruption Robust Reinforcement Learning

Neural Information Processing SystemsJun-23-2026, 01:58:05 GMT

Real-world datasets collected from sensors or human inputs are prone to noise and errors, posing significant challenges for applying offline reinforcement learning (RL). While existing methods have made progress in addressing corrupted actions and rewards, they remain insufficient for handling corruption in high-dimensional state spaces and for cases where multiple elements in the dataset are corrupted simultaneously. Diffusion models, known for their strong denoising capabilities, offer a promising direction for this problem--but their tendency to overfit noisy samples limits their direct applicability. To overcome this, we propose Ambient Diffusion-Guided Dataset Recovery (ADG), a novel approach that pioneers the use of diffusion models to tackle data corruption in offline RL. First, we introduce Ambient Denoising Diffusion Probabilistic Models (DDPM) from approximated distributions, which enable learning on partially corrupted datasets with theoretical guarantees.

justification, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

dba31bb5c75992690f20c2d3b370ec7c-Supplemental.pdf

Neural Information Processing SystemsApr-27-2026, 09:19:55 GMT

artificial intelligence, equation, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

f9e2800a251fa9107a008104f47c45d1-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 23:03:29 GMT

After the bidirectional models and rollout policies are well trained, we utilize them to generate imaginary trajectories, while conducting double check and admitting high-confidence transitions simultaneously.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

dba31bb5c75992690f20c2d3b370ec7c-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 11:26:29 GMT

equation, iql, time iql, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

Su, Jianhai, Luo, Jinzhu, Zhang, Qi

arXiv.org Machine LearningDec-2-2025

We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical interactions as offline dataset. We formalize this idea into a framework that accommodates several variants of offline RL incorporation such as final policy recommendation and online fine-tuning. We further introduce convenient techniques to improve its effectiveness in enhancing online learning efficiency. Our extensive and systematic empirical analyses show that 1) the effectiveness of the proposed framework depends strongly on the nature of the task, 2) our proposed techniques greatly enhance its effectiveness, and 3) existing online fine-tuning methods are overall ineffective, calling for more research therein.

dataset, fine-tuning, sac, (14 more...)

arXiv.org Machine Learning

2512.00383

Country: North America > United States > South Carolina (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Add feedback

A Comparison Between Decision Transformers and Traditional Offline Reinforcement Learning Algorithms

Caunhye, Ali Murtaza, Jeewa, Asad

arXiv.org Artificial IntelligenceNov-21-2025

The field of Offline Reinforcement Learning (RL) aims to derive effective policies from pre-collected datasets without active environment interaction. While traditional offline RL algorithms like Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) have shown promise, they often face challenges in balancing exploration and exploitation, especially in environments with varying reward densities. The recently proposed Decision Transformer (DT) approach, which reframes offline RL as a sequence modelling problem, has demonstrated impressive results across various benchmarks. This paper presents a comparative study evaluating the performance of DT against traditional offline RL algorithms in dense and sparse reward settings for the ANT con-tinous control environment. Our research investigates how these algorithms perform when faced with different reward structures, examining their ability to learn effective policies and generalize across varying levels of feedback. Through empirical analysis in the ANT environment, we found that DTs showed less sensitivity to varying reward density compared to other methods and particularly excelled with medium-expert datasets in sparse reward scenarios. In contrast, traditional value-based methods like IQL showed improved performance in dense reward settings with high-quality data, while CQL offered balanced performance across different data qualities. Additionally, DTs exhibited lower variance in performance but required significantly more computational resources compared to traditional approaches. These findings suggest that sequence modelling approaches may be more suitable for scenarios with uncertain reward structures or mixed-quality data, while value-based methods remain competitive in settings with dense rewards and high-quality demonstrations.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2511.16475

Country: Africa (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Improved Offline Reinforcement Learning via Quantum Metric Encoding

Lv, Outongyi, Yuan, Yewei, Liu, Nana

arXiv.org Artificial IntelligenceNov-14-2025

Reinforcement learning (RL) with limited samples is common in real-world applications. However, offline RL performance under this constraint is often suboptimal. We consider an alternative approach to dealing with limited samples by introducing the Quantum Metric Encoder (QME). In this methodology, instead of applying the RL framework directly on the original states and rewards, we embed the states into a more compact and meaningful representation, where the structure of the encoding is inspired by quantum circuits. For classical data, QME is a classically simulable, trainable unitary embedding and thus serves as a quantum-inspired module, on a classical device. For quantum data in the form of quantum states, QME can be implemented directly on quantum hardware, allowing for training without measurement or re-encoding. We evaluated QME on three datasets, each limited to 100 samples. We use Soft-Actor-Critic (SAC) and Implicit-Q-Learning (IQL), two well-known RL algorithms, to demonstrate the effectiveness of our approach. From the experimental results, we find that training offline RL agents on QME-embedded states with decoded rewards yields significantly better performance than training on the original states and rewards. On average across the three datasets, for maximum reward performance, we achieve a 116.2% improvement for SAC and 117.6% for IQL. We further investigate the $Δ$-hyperbolicity of our framework, a geometric property of the state space known to be important for the RL training efficacy. The QME-embedded states exhibit low $Δ$-hyperbolicity, suggesting that the improvement after embedding arises from the modified geometry of the state space induced by QME. Thus, the low $Δ$-hyperbolicity and the corresponding effectiveness of QME could provide valuable information for developing efficient offline RL methods under limited-sample conditions.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2511.10187

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

f9e2800a251fa9107a008104f47c45d1-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 21:13:32 GMT

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Inverse Q-learning with Constraints Appendix Gabriel Kalweit

Neural Information Processing SystemsAug-15-2025, 14:08:07 GMT

Visualizations of the real and learned state-values of IA VI, IQL and DIQL can be found in Figure 7.Figure 7: Visualization of state-values for different numbers of trajectories in Objectworld. Table 2: Comparison between online and offline estimation of state-action visitations for the Ob-jectworld environment, given a data set with an action distribution equivalent to the true optimal Boltzmann distribution. The pseudocode of the tabular variant of Constrained Inverse Q-learning can be found in Algorithm 4. See [4] for further details of Constrained Q-learning.Algorithm 4: Tabular Model-free Constrained Inverse Q-learning The pseudocode of Deep Constrained Inverse Q-learning can be found in Algorithm 5. The lower row shows the EVD. 3 For DIQL, the parameters were optimized in the range of Hence, it can only increase.

deep inverse q-learning, inverse q-learning, q-learning, (14 more...)

Neural Information Processing Systems

Country: