AITopics | sample-efficient reinforcement learning

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Neural Information Processing SystemsDec-24-2025, 20:53:48 GMT

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $\mathcal{S}$ and the action space $\mathcal{A}$ are both finite, to obtain a near optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with $|\mathcal{S}|\times|\mathcal{A}|$, which can be prohibitively large when $\mathcal{S}$ or $\mathcal{A}$ is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.$~$Q-learning)

linearly-parameterized mdp, mathcal, sample-efficient reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.45)

Add feedback

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Neural Information Processing SystemsDec-24-2025, 17:12:05 GMT

Partial observability is a common challenge in many reinforcement learning applications, which requires an agent to maintain memory, infer latent states, and integrate this past information into exploration. This challenge leads to a number of computational and statistical hardness results for learning general Partially Observable Markov Decision Processes (POMDPs). This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of POMDPs. In particular, we present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works. OOM-UCB achieves an optimal sample complexity of $\tilde{\mathcal{O}}(1/\varepsilon^2)$ for finding an $\varepsilon$-optimal policy, along with being polynomial in all other relevant quantities. As an interesting special case, we also provide a computationally and statistically efficient algorithm for POMDPs with deterministic state transitions.

name change, pomdp, sample-efficient reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Neural Information Processing SystemsDec-24-2025, 11:25:58 GMT

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system. This paper studies these tasks under the general model of multiplayer general-sum Partially Observable Markov Games (POMGs), which is significantly larger than the standard model of Imperfect Information Extensive-Form Games (IIEFGs). We identify a rich subclass of POMGs---weakly revealing POMGs---in which sample-efficient learning is tractable. In the self-play setting, we prove that a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to find approximate Nash equilibria, correlated equilibria, as well as coarse correlated equilibria of weakly revealing POMGs, in a polynomial number of samples when the number of agents is small. In the setting of playing against adversarial opponents, we show that a variant of our optimistic MLE algorithm is capable of achieving sublinear regret when being compared against the optimal maximin policies. To our best knowledge, this work provides the first line of sample-efficient results for learning POMGs.

name change, observable markov game, sample-efficient reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

Add feedback

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Neural Information Processing SystemsDec-24-2025, 10:38:13 GMT

Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates linear realizability of the optimal Q-function (also called the ``linear $Q^{\star}$ problem''). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available, but is unfortunately infeasible when we turn to standard online RL settings. We make progress towards understanding this linear $Q^{\star}$ problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states. This protocol is more flexible than the standard online RL setting, while being practically relevant and far more restrictive than the generative model. We develop an algorithm tailored to this setting, achieving a sample complexity that scales polynomially with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space. Our findings underscore the fundamental interplay between sampling protocols and low-complexity function representation in RL.

linearly realizable mdp, name change, sample-efficient reinforcement learning, (10 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Neural Information Processing SystemsNov-20-2025, 23:12:41 GMT

There is growing interest in combining model-free and model-based approaches in reinforcement learning with the goal of achieving the high performance of model-free algorithms with low sample complexity. This is difficult because an imperfect dynamics model can degrade the performance of the learning algorithm, and in sufficiently complex environments, the dynamics model will always be imperfect. As a result, a key challenge is to combine model-based approaches with model-free learning in such a way that errors in the model do not degrade performance. We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths, STEVE ensures that the model is only utilized when doing so does not introduce significant errors. Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency.

name change, sample-efficient reinforcement learning, stochastic ensemble value expansion, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.85)
Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Neural Information Processing SystemsAug-16-2025, 16:31:14 GMT

In many sequential decision making settings, the agent lacks complete information about the underlying state of the system, a phenomenon known as partial observability .

algorithm, pomdp, probability, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Workflow (0.68)
Research Report > New Finding (0.46)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)

Add feedback

Review for NeurIPS paper: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Neural Information Processing SystemsFeb-6-2025, 20:14:55 GMT

Weaknesses: A few comments that are needed to be addressed: 1) The first comment is about the presentation of the derivations. There are steps in the appendix, and also in the main text that are skipped. Some of them took me a while to rederive, some I couldn't spend more time to rederive. Some steps are also taken as granted in the main text. It is useful to elaborate on them more.

derivation, sample-efficient reinforcement learning, undercomplete pomdp, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Neural Information Processing SystemsJan-19-2025, 01:03:08 GMT

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space \mathcal{S} and the action space \mathcal{A} are both finite, to obtain a near optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with \mathcal{S} \times \mathcal{A}, which can be prohibitively large when \mathcal{S} or \mathcal{A} is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp. Q-function) with high probability as soon as the sample size exceeds the order of \frac{K}{(1-\gamma) {3}\varepsilon {2}} (resp.

linearly-parameterized mdp, mathcal, sample-efficient reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.64)

Add feedback

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Neural Information Processing SystemsJan-15-2025, 05:29:04 GMT

Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning (RL). The current paper pertains to a scenario with value-based linear representation, which postulates linear realizability of the optimal Q-function (also called the linear Q {\star} problem''). While linear realizability alone does not allow for sample-efficient solutions in general, the presence of a large sub-optimality gap is a potential game changer, depending on the sampling mechanism in use. Informally, sample efficiency is achievable with a large sub-optimality gap when a generative model is available, but is unfortunately infeasible when we turn to standard online RL settings. We make progress towards understanding this linear Q {\star} problem by investigating a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states.

linearly realizable mdp, sample-efficient reinforcement learning, sub-optimality gap, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Neural Information Processing SystemsOct-11-2024, 15:22:34 GMT

This paper considers the challenging tasks of Multi-Agent Reinforcement Learning (MARL) under partial observability, where each agent only sees her own individual observations and actions that reveal incomplete information about the underlying state of system. This paper studies these tasks under the general model of multiplayer general-sum Partially Observable Markov Games (POMGs), which is significantly larger than the standard model of Imperfect Information Extensive-Form Games (IIEFGs). We identify a rich subclass of POMGs---weakly revealing POMGs---in which sample-efficient learning is tractable. In the self-play setting, we prove that a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to find approximate Nash equilibria, correlated equilibria, as well as coarse correlated equilibria of weakly revealing POMGs, in a polynomial number of samples when the number of agents is small. In the setting of playing against adversarial opponents, we show that a variant of our optimistic MLE algorithm is capable of achieving sublinear regret when being compared against the optimal maximin policies.

observable markov game, pomg, sample-efficient reinforcement learning, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Charge (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.65)
(2 more...)

Add feedback

Filters

Collaborating Authors

sample-efficient reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Review for NeurIPS paper: Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games