If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
In this episode, we hear from Luca Colasanto, Senior Robotic Scientist at Realtime Robotics, about real-time robot motion planning in dynamic and complex environments with human-robot collaboration. Realtime Robotics focuses on accelerating conventional motion planning through optimization of algorithms and hardware to allow safe use of robotic tools in work areas with humans. Luca spoke to our interviewer Kate about Realtime Robotic's fast motion planning technology, including key aspects, such as perception, algorithms and custom hardware. Luca Colasanto is a Sr. Scientist at Realtime Robotics focusing on AI-based grasping and multi-robot optimization. Luca completed his PhD in Humanoid Robotics at Italian Institute of Technology, focusing on control systems for bipedal walking machines and compliant actuators.
The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study, state-of-the-art RL algorithms such as DDPG and TD3 are unable to steer a point mass in even small 2D mazes. In this paper, we propose a new algorithm called "Plan, Backplay, Chain Skills" (PBCS) that combines motion planning and reinforcement learning to solve hard exploration environments. In a first phase, a motion planning algorithm is used to find a single good trajectory, then an RL algorithm is trained using a curriculum derived from the trajectory, by combining a variant of the Backplay algorithm and skill chaining. We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes, and is able to improve on the trajectory obtained by the motion planning phase.
Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks.
Manipulation and assembly tasks require non-trivial planning of actions depending on the environment and the final goal. Previous work in this domain often assembles particular instances of objects from known sets of primitives. In contrast, we here aim to handle varying sets of primitives and to construct different objects of the same shape category. Given a single object instance of a category, e.g. an arch, and a binary shape classifier, we learn a visual policy to assemble other instances of the same category. In particular, we propose a disassembly procedure and learn a state policy that discovers new object instances and their assembly plans in state space. We then render simulated states in the observation space and learn a heatmap representation to predict alternative actions from a given input image. To validate our approach, we first demonstrate its efficiency for building object categories in state space. We then show the success of our visual policies for building arches from different primitives. Moreover, we demonstrate (i) the reactive ability of our method to re-assemble objects using additional primitives and (ii) the robust performance of our policy for unseen primitives resembling building blocks used during training. Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.
Heuristic search-based planning techniques are commonly used for motion planning on discretized spaces. The performance of these algorithms is heavily affected by the resolution at which the search space is discretized. Typically a fixed resolution is chosen for a given domain. While a finer resolution allows for better maneuverability, it significantly increases the size of the state space, and hence demands more search efforts. On the contrary, a coarser resolution gives a fast exploratory behavior but compromises on maneuverability and the completeness of the search. To effectively leverage the advantages of both high and low resolution discretizations, we propose Multi-Resolution A* (MRA*) algorithm, that runs multiple weighted-A*(WA*) searches having different resolution levels simultaneously and combines the strengths of all of them. In addition to these searches, MRA* uses one anchor search to control expansions from these searches. We show that MRA* is bounded suboptimal with respect to the anchor resolution search space and resolution complete. We performed experiments on several motion planning domains including 2D, 3D grid planning and 7 DOF manipulation planning and compared our approach with several search-based and sampling-based baselines.
Many real-world applications need path planning algorithms to solve tasks in different areas, such as social applications, autonomous cars, and tracking activities. And most importantly motion planning. Although the use of path planning is sufficient in most motion planning scenarios, they represent potential bottlenecks in large environments with dynamic changes. To tackle this problem, the number of possible routes could be reduced to make it easier for path planning algorithms to find the shortest path with less efforts. An traditional algorithm for path planning is the A*, it uses an heuristic to work faster than other solutions. In this work, we propose a CNN encoder capable of eliminating useless routes for motion planning problems, then we combine the proposed neural network output with A*. To measure the efficiency of our solution, we propose a database with different scenarios of motion planning problems. The evaluated metric is the number of the iterations to find the shortest path. The A* was compared with the CNN Encoder (proposal) with A*. In all evaluated scenarios, our solution reduced the number of iterations by more than 60\%.
In this paper, we propose a model-free reinforcement learning method to synthesize control policies for motion planning problems for continuous states and actions. The robot is modelled as a labeled Markov decision process (MDP) with continuous state and action spaces. Linear temporal logics (LTL) are used to specify high-level tasks. We then train deep neural networks to approximate the value function and policy using an actor-critic reinforcement learning method. The LTL specification is converted into an annotated limit-deterministic B\"uchi automaton (LDBA) for continuously shaping the reward so that dense reward is available during training. A naive way of solving a motion planning problem with LTL specifications using reinforcement learning is to sample a trajectory and, if the trajectory satisfies the entire LTL formula then we assign a high reward for training. However, the sampling complexity needed to find such a trajectory is too high when we have a complex LTL formula for continuous state and action spaces. As a result, it is very unlikely that we get enough reward for training if all sample trajectories start from the initial state in the automata. In this paper, we propose a method that samples not only an initial state from the state space, but also an arbitrary state in the automata at the beginning of each training episode. We test our algorithm in simulation using a car-like robot and find out that our method can learn policies for different working configurations and LTL specifications successfully.
Planning is an essential topic in the realm of automated driving. Besides planning algorithms that are widely covered in the literature, planning requires different software tools for its development, validation, and execution. This paper presents a survey of such tools including map representations, communication, traffic rules, open-source planning stacks and middleware, simulation, and visualization tools as well as benchmarks. We start by defining the planning task and different supporting tools. Next, we provide a comprehensive review of state-of-the-art developments and analysis of relations among them. Finally, we discuss the current gaps and suggest future research directions.
Planning smooth and energy-efficient motions for wheeled mobile robots is a central task for applications ranging from autonomous driving to service and intralogistic robotics. Over the past decades, a wide variety of motion planners, steer functions and path-improvement techniques have been proposed for such non-holonomic systems. With the objective of comparing this large assortment of state-of-the-art motion-planning techniques, we introduce a novel open-source motion-planning benchmark for wheeled mobile robots, whose scenarios resemble real-world applications (such as navigating warehouses, moving in cluttered cities or parking), and propose metrics for planning efficiency and path quality. Our benchmark is easy to use and extend, and thus allows practitioners and researchers to evaluate new motion-planning algorithms, scenarios and metrics easily. We use our benchmark to highlight the strengths and weaknesses of several common state-of-the-art motion planners and provide recommendations on when they should be used.
Many AI problems, in robotics and other domains, are goal-based, essentially seeking trajectories leading to various goal states. Reinforcement learning (RL), building on Bellman's optimality equation, naturally optimizes for a single goal, yet can be made multi-goal by augmenting the state with the goal. Instead, we propose a new RL framework, derived from a dynamic programming equation for the all pairs shortest path (APSP) problem, which naturally solves multi-goal queries. We show that this approach has computational benefits for both standard and approximate dynamic programming. Interestingly, our formulation prescribes a novel protocol for computing a trajectory: instead of predicting the next state given its predecessor, as in standard RL, a goal-conditioned trajectory is constructed by first predicting an intermediate state between start and goal, partitioning the trajectory into two. Then, recursively, predicting intermediate points on each sub-segment, until a complete trajectory is obtained. We call this trajectory structure a sub-goal tree. Building on it, we additionally extend the policy gradient methodology to recursively predict sub-goals, resulting in novel goal-based algorithms. Finally, we apply our method to neural motion planning, where we demonstrate significant improvements compared to standard RL on navigating a 7-DoF robot arm between obstacles.