AITopics | state distribution shift

Collaborating Authors

state distribution shift

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Review for NeurIPS paper: Conservative Q-Learning for Offline Reinforcement Learning

Neural Information Processing SystemsJan-21-2025, 12:26:27 GMT

The theoretical claims of producing lower bounds on Q values are not sufficient since there is no proof that the conservative Q values are anywhere near the true Q values. Just estimating Q 0 for positive rewards could give the same result at theorems 3.1, 3.2, and 3.3. Clearly, the algorithm is doing something smarter than this, but the current analysis does not characterize what the algorithm is doing. The gap expanding result is likely the strongest of the four theorems, but without doing the work to connect this back to why this will actually help performance it is still difficult to judge. Moreover, no comparison is made to the overestimation that would happen without the proposed algorithmic change.

conservative q-learning, neurips paper, offline reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift

Islam, Riashat, Teru, Komal K., Sharma, Deepak

arXiv.org Artificial IntelligenceNov-16-2019

Off-policy deep reinforcement learning (RL) algorithms are incapable of learning solely from batch offline data without online interactions with the environment, due to the phenomenon known as \textit{extrapolation error}. This is often due to past data available in the replay buffer that may be quite different from the data distribution under the current policy. We argue that most off-policy learning methods fundamentally suffer from a \textit{state distribution shift} due to the mismatch between the state visitation distribution of the data collected by the behavior and target policies. This data distribution shift between current and past samples can significantly impact the performance of most modern off-policy based policy optimization algorithms. In this work, we first do a systematic analysis of state distribution mismatch in off-policy learning, and then develop a novel off-policy policy optimization method to constraint the state distribution shift. To do this, we first estimate the state distribution based on features of the state, using a density estimator and then develop a novel constrained off-policy gradient objective that minimizes the state distribution shift. Our experimental results on continuous control tasks show that minimizing this distribution mismatch can significantly improve performance in most popular practical off-policy policy gradient algorithms.

algorithm, state distribution, state distribution shift, (14 more...)

arXiv.org Artificial Intelligence

1911.0697

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback