AITopics | pessimistic mdp

Collaborating Authors

pessimistic mdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 21:54:40 GMT

model-based offline reinforcement learning, morel, pessimistic mdp, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)

Add feedback

f7efa4f864ae9b88d43527f4b14f750f-Paper.pdf

Neural Information Processing SystemsSep-26-2025, 05:02:33 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County (0.14)
North America > Canada (0.14)
Europe > Spain (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Review for NeurIPS paper: MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsFeb-8-2025, 10:51:12 GMT

Additional Feedback: Most of recent offline RL algorithms rely on policy regularization where the optimizing policy is prevented from deviating too much from the data-logging policy. Differently, MOReL does not directly rely on the data-logging policy but exploits pessimism to a model-based approach, providing another good direction for offline RL. However, it would be more natural to penalize more to more uncertain states. For example, one classical model-based RL algorithm (MBIE-EB) constructs an optimistic MDP that rewarding the uncertain regions by the bonus proportional to the 1/sqrt(N(s,a)) where N(s,a) is the visitation count. In contrast, but similarly to MBIE-EB, we may consider a pessimistic MDP that penalizes the uncertain regions by the penalty proportional to the 1/sqrt(N(s,a)). How is it justified to use alpha greater than zero for USAD? - It would be great to see how sensitive the performance of the algorithm with respect to kappa in the reward penalty and threshold in USAD.

model-based offline reinforcement learning, offline reinforcement learning, reinforcement learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MOReL: Model-Based Offline Reinforcement Learning

Neural Information Processing SystemsJan-16-2025, 07:41:36 GMT

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. This serves as an extreme test for an agent's ability to effectively use historical data which is known to be critical for efficient RL. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. This framework consists of two steps: (a) learning a pessimistic MDP using the offline dataset; (b) learning a near-optimal policy in this pessimistic MDP. The design of the pessimistic MDP is such that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the pessimistic MDP.

model-based offline reinforcement learning, offline rl, pessimistic mdp, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

MOReL : Model-Based Offline Reinforcement Learning

Kidambi, Rahul, Rajeswaran, Aravind, Netrapalli, Praneeth, Joachims, Thorsten

arXiv.org Artificial IntelligenceJul-20-2020

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies offline can greatly expand the applicability of RL, its data efficiency, and its experimental velocity. Prior work in offline RL has been confined almost exclusively to model-free RL approaches. In this work, we present MOReL, an algorithmic framework for model-based offline RL. This framework consists of two steps: (a) learning a pessimistic MDP (P-MDP) using the offline dataset; and (b) learning a near-optimal policy in this P-MDP. The learned P-MDP has the property that for any policy, the performance in the real environment is approximately lower-bounded by the performance in the P-MDP. This enables it to serve as a good surrogate for purposes of policy evaluation and learning, and overcome common pitfalls of model-based RL like model exploitation. Theoretically, we show that MOReL is minimax optimal (up to log factors) for offline RL. Through experiments, we show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks. Moreover, the modular design of MOReL enables future advances in its components (e.g. generative modeling, uncertainty estimation, planning etc.) to directly translate into advances for offline RL.

dataset, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2005.05951

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes

Wachi, Akifumi (University of Tokyo) | Sui, Yanan (California Institute of Technology) | Yue, Yisong (California Institute of Technology) | Ono, Masahiro (California Institute of Technology)

AAAI ConferencesFeb-8-2018

We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.

artificial intelligence, reinforcement learning, reward function, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Genre: Research Report (0.66)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback