AITopics | near-optimal time and sample complexity

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Aaron Sidford, Mengdi Wang, Xian Wu, Lin Yang, Yinyu Ye

Neural Information Processing SystemsNov-20-2025, 23:39:19 GMT

In this paper we consider the problem of computing an null -optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in O (1) time.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Neural Information Processing SystemsNov-20-2025, 22:53:51 GMT

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

markov decision process, name change, near-optimal time and sample complexity, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.43)

Add feedback

Reviews: Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Neural Information Processing SystemsOct-8-2024, 02:08:13 GMT

In this paper the authors present PAC-RL results for finite-spaces discounted MDP with a generative sampling model (which can generate in O(1) time a state transition for each given state-action pair). They introduce a value-iteration type algorithm for finding an \epsilon -optimal policy, and prove that its running time and sample complexity are nearly the best possible and match, up to logarithmic factors, the lower bounds established previously in [AMK13]. These results are significant: They improve over a recent result in [SWWY18] by a factor of (1 - \gamma) {-1} ( \gamma is the discount factor), and they also fill a known gap in the earlier results of [AMK13] regarding the sample complexity of finding a near-optimal policy. Both the algorithm and its analysis are very interesting. They build upon the prior work [AMK13] and [SWWY18], and combine the algorithmic ideas and proof techniques from both papers in a highly non-trivial and original way.

inequality, markov decision process, near-optimal time and sample complexity, (9 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.40)

Add feedback

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Sidford, Aaron, Wang, Mengdi, Wu, Xian, Yang, Lin, Ye, Yinyu

Neural Information Processing SystemsFeb-14-2020, 16:26:10 GMT

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time. We also extend our method to computing $\epsilon$-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound. Papers published at the Neural Information Processing Systems Conference.

machine learning, natural language, near-optimal time and sample complexity, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.98)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.64)

Add feedback

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Sidford, Aaron, Wang, Mengdi, Wu, Xian, Yang, Lin, Ye, Yinyu

Neural Information Processing SystemsDec-31-2018

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time. Given such a DMDP with states $\states$, actions $\actions$, discount factor $\gamma\in(0,1)$, and rewards in range $[0, 1]$ we provide an algorithm which computes an $\epsilon$-optimal policy with probability $1 - \delta$ where {\it both} the run time spent and number of sample taken is upper bounded by \[ O\left[\frac{|\cS||\cA|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|\cS||\cA|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. \] For fixed values of $\epsilon$, this improves upon the previous best known bounds by a factor of $(1 - \gamma)^{-1}$ and matches the sample complexity lower bounds proved in \cite{azar2013minimax} up to logarithmic factors. We also extend our method to computing $\epsilon$-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Sidford, Aaron, Wang, Mengdi, Wu, Xian, Yang, Lin, Ye, Yinyu

Neural Information Processing SystemsDec-31-2018

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time. Given such a DMDP with states $\states$, actions $\actions$, discount factor $\gamma\in(0,1)$, and rewards in range $[0, 1]$ we provide an algorithm which computes an $\epsilon$-optimal policy with probability $1 - \delta$ where {\it both} the run time spent and number of sample taken is upper bounded by \[ O\left[\frac{|\cS||\cA|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|\cS||\cA|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. \] For fixed values of $\epsilon$, this improves upon the previous best known bounds by a factor of $(1 - \gamma)^{-1}$ and matches the sample complexity lower bounds proved in \cite{azar2013minimax} up to logarithmic factors. We also extend our method to computing $\epsilon$-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Collaborating Authors

near-optimal time and sample complexity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Reviews: Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model

Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model