Planning & Scheduling
Trajectory-Based Short-Sighted Probabilistic Planning
Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state and action. In order to compute a solution for a probabilistic planning problem, planners need to manage the uncertainty associated with the different paths from the initial state to a goal state. Several approaches to manage uncertainty were proposed, e.g., consider all paths at once, perform determinization of actions, and sampling. In this paper, we introduce trajectory-based short-sighted Stochastic Shortest Path Problems (SSPs), a novel approach to manage uncertainty for probabilistic planning problems in which states reachable with low probability are substituted by artificial goals that heuristically estimate their cost to reach a goal state. We also extend the theoretical results of Short-Sighted Probabilistic Planner (SSiPP) [1] by proving that SSiPP always finishes and is asymptotically optimal under sufficient conditions on the structure of short-sighted SSPs. We empirically compare SSiPP using trajectorybased short-sighted SSPs with the winners of the previous probabilistic planning competitions and other state-of-the-art planners in the triangle tireworld problems.
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayesoptimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems - because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.
fe709c654eac84d5239d1a12a4f71877-Reviews.html
The main idea is to sample several determinations of the system in the form of roll-out trees where each state/action pair has only one sampled successor. A combination of breadth-first and best-first search is used to explore the deterministic trees, and then they are recombined to create a stochastic model from which a policy can be calculated. The algorithm is proven to be consistent (as the number of trees and number of nodes in each tree both approach infinity, the value at the root can be arbitrarily approximated with high probability). The algorithm is empirically compared to an planning algorithm that requires a full transition model and performs well in comparison.
Aggregating Optimistic Planning Trees for Solving Markov Decision Processes
This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a "safe" optimistic planning strategy combining the optimistic principle (in order to explore the most promising part of the search space first) with a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We also report numerical results on a benchmark problem. Our algorithm performs as well as state-of-the-art optimistic planning algorithms, and better than a related algorithm which additionally assumes the knowledge of all transition distributions.
Development of control algorithms for mobile robotics focused on their potential use for FPGA-based robots
Suárez-Gómez, Andrés-David, Ortega, Andres A. Hernandez
This paper investigates the development and optimization of control algorithms for mobile robotics, with a keen focus on their implementation in Field-Programmable Gate Arrays (FPGAs). It delves into both classical control approaches such as PID and modern techniques including deep learning, addressing their application in sectors ranging from industrial automation to medical care. The study highlights the practical challenges and advancements in embedding these algorithms into FPGAs, which offer significant benefits for mobile robotics due to their high-speed processing and parallel computation capabilities. Through an analysis of various control strategies, the paper showcases the improvements in robot performance, particularly in navigation and obstacle avoidance. It emphasizes the critical role of FPGAs in enhancing the efficiency and adaptability of control algorithms in dynamic environments. Additionally, the research discusses the difficulties in benchmarking and evaluating the performance of these algorithms in real-world applications, suggesting a need for standardized evaluation criteria. The contribution of this work lies in its comprehensive examination of control algorithms' potential in FPGA-based mobile robotics, offering insights into future research directions for improving robotic autonomy and operational efficiency.
DESPOT: Online POMDP Planning with Regularization
POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the "curse of dimensionality" and the "curse of history". This paper presents an online POMDP algorithm that alleviates these difficulties by focusing the search on a set of randomly sampled scenarios.
846c260d715e5b854ffad5f70a516c88-Reviews.html
The paper proposes a Bayesian inference in Monte-Carlo tree search (MCTS) with Thompson sampling based action-selection strategy, called Dirichlet-NormalGamma MCTS (DNG-MTCS) algorithm. The method approximates the accumulated reward of following the current policy from a state, X_{s,\pi(s)}, by the normal distribution with the NromalGamma distribution prior. The state transition probabilities are estimated via Dirichlet distributions. Action-selection strategy is based on Thompson sampling approach, where the expected cumulative reward for each action is computed with the parametric distribution with parameters drawn from the posterior distributions and then the action with the highest expectation is selected. The authors apply the proposed method to several benchmark tasks and showed that the method can converge (slightly) faster than the UCT algorithm. Theoretical properties about convergence are also provided.
Bayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search
Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning and learning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we present a novel approach for MCTS using Bayesian mixture modeling and inference based Thompson sampling and apply it to the problem of online planning in MDPs. Our algorithm, named Dirichlet-NormalGamma MCTS (DNG-MCTS), models the uncertainty of the accumulated reward for actions in the search tree as a mixture of Normal distributions. We perform inferences on the mixture in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions and select the best action at each decision node using Thompson sampling. Experimental results confirm that our algorithm advances the state-of-the-art UCT approach with better values on several benchmark problems.
Synthesizing Robust Plans under Incomplete Domain Models
Most current planners assume complete domain models and focus on generating correct plans. Unfortunately, domain modeling is a laborious and error-prone task, thus real world agents have to plan with incomplete domain models. While domain experts cannot guarantee completeness, often they are able to circumscribe the incompleteness of the model by providing annotations as to which parts of the domain model may be incomplete. In such cases, the goal should be to synthesize plans that are robust with respect to any known incompleteness of the domain. In this paper, we first introduce annotations expressing the knowledge of the domain incompleteness and formalize the notion of plan robustness with respect to an incomplete domain model. We then show an approach to compiling the problem of finding robust plans to the conformant probabilistic planning problem, and present experimental results with Probabilistic-FF planner.
Convergence of Monte Carlo Tree Search in Simultaneous Move Games Marc Lanctot
We study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. We present a general template of MCTS algorithms for these games, which can be instantiated by various selection methods. We formally prove that if a selection method is ɛ-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium (NE) of the extensive-form game. We empirically evaluate this claim using regret matching and Exp3 as the selection methods on randomly generated games and empirically selected worst case games. We confirm the formal result and show that additional MCTS variants also converge to approximate NE on the evaluated games.