Goto

Collaborating Authors

 exploitation phase




Multi-armed Bandits: Competing with Optimal Sequences

Neural Information Processing Systems

We consider sequential decision making problem in the adversarial setting, where regret is measured with respect to the optimal sequence of actions and the feedback adheres the bandit setting. It is well-known that obtaining sublinear regret in this setting is impossible in general, which arises the question of when can we do better than linear regret? Previous works show that when the environment is guaranteed to vary slowly and furthermore we are given prior knowledge regarding its variation (i.e., a limit on the amount of changes suffered by the environment), then this task is feasible. The caveat however is that such prior knowledge is not likely to be available in practice, which causes the obtained regret bounds to be somewhat irrelevant. Our main result is a regret guarantee that scales with the variation parameter of the environment, without requiring any prior knowledge about it whatsoever. By that, we also resolve an open problem posted by Gur, Zeevi and Besbes [8]. An important key component in our result is a statistical test for identifying non-stationarity in a sequence of independent random variables. This test either identifies nonstationarity or upper-bounds the absolute deviation of the corresponding sequence of mean values in terms of its total variation. This test is interesting on its own right and has the potential to be found useful in additional settings.





BBoE: Leveraging Bundle of Edges for Kinodynamic Bidirectional Motion Planning

arXiv.org Artificial Intelligence

Abstract-- In this work, we introduce BBoE, a bidirectional, kinodynamic, sampling-based motion planner that consistently and quickly finds low-cost solutions in environments with varying obstacle clutter . The algorithm combines exploration and exploitation while relying on precomputed robot state traversals, resulting in efficient convergence towards the goal. Our key contributions include: i) a strategy to navigate through obstacle-rich spaces by sorting and sequencing preprocessed forward propagations; and ii) BBoE, a robust bidirectional kinodynamic planner that utilizes this strategy to produce fast and feasible solutions. The proposed framework reduces planning time, diminishes solution cost and increases success rate in comparison to previous approaches. I. INTRODUCTION Motion planning in robotics involves identifying a series of valid configurations that a robot can assume to transition from an initial state to a desired goal state. Sampling-based planning is a popular graph-based approach used to generate robot motions by sampling discrete states and establishing connections between them via edges [23]. Their popularity is due to the inherent property of probabilistic completeness, which guarantees that a solution will be found, if one exists, as the number of sampled states reaches infinity [17], [10]. Traditionally, these techniques employ a unidirectional tree that grows from the start state and expands towards the goal region [17], [10], [6].