AITopics | approximate planning

Approximate Planning in Large POMDPs via Reusable Trajectories

Neural Information Processing SystemsApr-6-2023, 17:12:56 GMT

We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies TI in a partially observable Markov deci(cid:173) sion process (POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity - that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on the sample com(cid:173) plexity showing that, even for infinitely large and arbitrarily complex POMDPs, the amount of data needed can be finite, and depends only linearly on the complexity of the restricted strategy class TI, and expo(cid:173) nentially on the horizon time. This latter dependence can be eased in a variety of ways, including the application of gradient and local search algorithms.

approximate planning, pomdp, reusable trajectory, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Approximate Planning for Factored POMDPs

Feng, Zhengzhu (University of Massachusetts, Amherst) | Hansen, Eric A. (Mississippi State University)

AAAI ConferencesJun-9-2014

We describe an approximate dynamic programming algorithm for partially observable Markov decision processes represented in factored form. Two complementary forms of approximation are used to simplify a piecewise linear and convex value function, where each linear facet of the function is represented compactly by an algebraic decision diagram. ln one form of approximation, the degree of state abstraction is increased by aggregating states with similar values. In the second form of approximation, the value function is simplified by removing linear facets that contribute marginally to value. We derive an error bound that applies to both forms of approximation. Experimental results show that this approach improves the performance of dynamic programming and extends the range of problems it can solve.

approximate planning, factored pomdp

AAAI Conferences

Sixth European Conference on Planning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Approximate Planning in POMDPs with Macro-Actions

Theocharous, Georgios, Kaelbling, Leslie P.

Neural Information Processing SystemsDec-31-2004

Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.

algorithm, belief space, pomdp, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Approximate Planning in POMDPs with Macro-Actions

Theocharous, Georgios, Kaelbling, Leslie P.

Neural Information Processing SystemsDec-31-2004

Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.

algorithm, belief space, pomdp, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Approximate Planning in POMDPs with Macro-Actions

Theocharous, Georgios, Kaelbling, Leslie P.

Neural Information Processing SystemsDec-31-2004

Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learningalgorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Approximate Planning in Large POMDPs via Reusable Trajectories

Kearns, Michael J., Mansour, Yishay, Ng, Andrew Y.

Neural Information Processing SystemsDec-31-2000

We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies TI in a partially observable Markov decision process (POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity - that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on the sample complexity showing that, even for infinitely large and arbitrarily complex POMDPs, the amount of data needed can be finite, and depends only linearly on the complexity of the restricted strategy class TI, and exponentially on the horizon time. This latter dependence can be eased in a variety of ways, including the application of gradient and local search algorithms.

generative model, observable history, pomdp, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Approximate Planning in Large POMDPs via Reusable Trajectories

Kearns, Michael J., Mansour, Yishay, Ng, Andrew Y.

Neural Information Processing SystemsDec-31-2000

We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies TI in a partially observable Markov decision process (POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity - that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on the sample complexity showing that, even for infinitely large and arbitrarily complex POMDPs, the amount of data needed can be finite, and depends only linearly on the complexity of the restricted strategy class TI, and exponentially on the horizon time. This latter dependence can be eased in a variety of ways, including the application of gradient and local search algorithms.

generative model, observable history, pomdp, (12 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Approximate Planning in Large POMDPs via Reusable Trajectories

Kearns, Michael J., Mansour, Yishay, Ng, Andrew Y.

Neural Information Processing SystemsDec-31-2000

We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies TI in a partially observable Markov decision process(POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity - that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on the sample complexity showingthat, even for infinitely large and arbitrarily complex POMDPs, the amount of data needed can be finite, and depends only linearly on the complexity of the restricted strategy class TI, and exponentially onthe horizon time. This latter dependence can be eased in a variety of ways, including the application of gradient and local search algorithms.

artificial intelligence, machine learning, pomdp, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback