AITopics | cql

Collaborating Authors

cql

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material for Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble AEnsemble gradient diversification

Neural Information Processing SystemsApr-25-2026, 13:16:29 GMT

Proposition 1. Suppose Qφj(s,a) = Q(s,a) and Qφj(s,) is locally linear in the neighborhood of a for all j [N]. Let λmin and wmin be the smallest eigenvalue and the corresponding normalized eigenvector of the matrix Var aQφj(s,a) and > 0 be the value such that mini6=j aQφi(s,a), aQφj(s,a) = 1 . We first prove that the smallest eigenvalue λmin of Var aQφj(s,a) is upper-bounded by some constant multiple of . By Lemma 1, the total variance of the matrix is less or equal to N 1N. Note that, using the fact that the Q-values coincide at the action a and the local linearity of the Q-functions, we have derived Var(Qφj(s,a+ kw)) = k2w|Var aQφj(s,a) w. (2) Plugging w = wmin in Equation (2) and using Equation (1), we have Var(Qφj(s,a+ kwmin)) = k2w|minVar aQφj(s,a) wmin = k2λmin A.2 Relationship between maximizing the total variance and maximizing the smallest eigenvalue As we have shown in Section 4, maximizing the total variance of the matrix Var ( aQφi(s,a)) is equivalent to minimizing the cosine similarity of all distinct pairs of the gradients aQφi(s,a), 2 which makes the gradients uniformly distributed on the unit sphere S|A| 1.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Neural Information Processing SystemsApr-25-2026, 13:16:26 GMT

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset. However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem. Moreover, these methods under-utilize the generalization ability of deep neural networks and often fall into suboptimal solutions too close to the given dataset. In this work, we propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the data distribution. We show that the clipped Q-learning, a technique widely used in online RL, can be leveraged to successfully penalize OOD data points with high prediction uncertainties. Surprisingly, we find that it is possible to substantially outperform existing offline RL methods on various tasks by simply increasing the number of Q-networks along with the clipped Q-learning. Based on this observation, we propose an ensemble-diversified actor-critic algorithm that reduces the number of required ensemble networks down to a tenth compared to the naive ensemble while achieving state-of-the-art performance on most of the D4RL benchmarks considered.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Outline of the Supplementary Material

Neural Information Processing SystemsApr-24-2026, 16:10:23 GMT

In this section, we provide more information on the application backgrounds, including the detailed structures of the RAS and VAS, the structures of the simulated advertising system. We also discuss the importance and universality of the IBOO problem in auto-bidding, which acts as the motivation of this work.

artificial intelligence, impression opportunity, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

b2cac94f82928a85055987d9fd44753f-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 10:21:51 GMT

architecture, contributed, experiment, (14 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.96)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Neural Information Processing SystemsFeb-19-2026, 08:00:47 GMT

Our key insight is that the user's examination and click behavior

information retrieval, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Oregon (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

A Detailed Proof 1 A.1 Proof of Theorem 4.1

Neural Information Processing SystemsFeb-17-2026, 23:20:32 GMT

We can compute the fixed point of the recursion in Equation A.2 and get the following estimated Then we compare these two gaps. To utilize the Eq. 4 for policy optimization, following the analysis in the Section 3.2 in Kumar et al. By choosing different regularizer, there are a variety of instances within CQL family. B.36 called CFCQL( H) which is the update rule we used: In discrete action space, we train a three-level MLP network with MLE loss. In continuous action space, we use the method of explicit estimation of behavior density in Wu et al.

artificial intelligence, cql, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning Jianzhun Shao, Y un Qu

Neural Information Processing SystemsFeb-17-2026, 23:20:28 GMT

MARL in real scenarios is still challenging due to the same safety and efficiency concerns in single-agent setting, then it is worth conducting investigation for offline RL in multi-agent setting.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Recovering from Out-of-sample States via Inverse Dynamics in Offline Reinforcement Learning

Neural Information Processing SystemsFeb-15-2026, 05:31:14 GMT

However, such pessimism for out-of-sample data could be too restricted and sample inefficient, as not all out-of-sample(unseen) states are not generalizable [20].

inverse dynamic model, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: