Goto

Collaborating Authors

 Planning & Scheduling


Artificial Intelligence: Powering Human Exploration of the Moon and Mars

arXiv.org Artificial Intelligence

Over the past decade, the NASA Autonomous Systems and Operations (ASO) project has developed and demonstrated numerous autonomy enabling technologies employing AI techniques. Our work has employed AI in three distinct ways to enable autonomous mission operations capabilities. Crew Autonomy gives astronauts tools to assist in the performance of each of these mission operations functions. Vehicle System Management uses AI techniques to turn the astronaut's spacecraft into a robot, allowing it to operate when astronauts are not present, or to reduce astronaut workload. AI technology also enables Autonomous Robots as crew assistants or proxies when the crew are not present. We first describe human spaceflight mission operations capabilities. We then describe the ASO project, and the development and demonstration performed by ASO since 2011. We will describe the AI techniques behind each of these demonstrations, which include a variety of symbolic automated reasoning and machine learning based approaches. Finally, we conclude with an assessment of future development needs for AI to enable NASA's future Exploration missions.


The Choice Function Framework for Online Policy Improvement

arXiv.org Artificial Intelligence

There are notable examples of online search improving over hand-coded or learned policies (e.g. AlphaZero) for sequential decision making. It is not clear, however, whether or not policy improvement is guaranteed for many of these approaches, even when given a perfect evaluation function and transition model. Indeed, simple counter examples show that seemingly reasonable online search procedures can hurt performance compared to the original policy. To address this issue, we introduce the choice function framework for analyzing online search procedures for policy improvement. A choice function specifies the actions to be considered at every node of a search tree, with all other actions being pruned. Our main contribution is to give sufficient conditions for stationary and non-stationary choice functions to guarantee that the value achieved by online search is no worse than the original policy. In addition, we describe a general parametric class of choice functions that satisfy those conditions and present an illustrative use case of the framework's empirical utility.


Regression Planning Networks

arXiv.org Artificial Intelligence

Recent learning-to-plan methods have shown promising results on planning directly from observation space. Yet, their ability to plan for long-horizon tasks is limited by the accuracy of the prediction model. On the other hand, classical symbolic planners show remarkable capabilities in solving long-horizon tasks, but they require predefined symbolic rules and symbolic states, restricting their real-world applicability. In this work, we combine the benefits of these two paradigms and propose a learning-to-plan method that can directly generate a long-term symbolic plan conditioned on high-dimensional observations. We borrow the idea of regression (backward) planning from classical planning literature and introduce Regression Planning Networks (RPN), a neural network architecture that plans backward starting at a task goal and generates a sequence of intermediate goals that reaches the current observation. We show that our model not only inherits many favorable traits from symbolic planning, e.g., the ability to solve previously unseen tasks but also can learn from visual inputs in an end-to-end manner. We evaluate the capabilities of RPN in a grid world environment and a simulated 3D kitchen environment featuring complex visual scenes and long task horizons, and show that it achieves near-optimal performance in completely new task instances.


Action Selection for MDPs: Anytime AO* vs. UCT

arXiv.org Artificial Intelligence

In the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO* must thus address an exploration-exploitation tradeoff: they cannot just "exploit", they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finite-horizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using non-admissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinite-horizon MDPs that cannot be solved with existing off-line heuristic search and dynamic programming algorithms, and compare it with UCT. Introduction One of the natural approaches for selecting actions in very large state spaces is by performing a limited amount of lookahead. In the contexts of discounted MDPs, Kearns, Mansour, and Ng have shown that near to optimal actions can be selected by considering a sampled lookahead tree that is sufficiently sparse, whose size depends on the discount factor and the suboptimality bound but not on the number of problem states (Kearns, Mansour, and Ng 1999).


Higher-Dimensional Potential Heuristics for Optimal Classical Planning

arXiv.org Artificial Intelligence

Potential heuristics for state-space search are defined as weighted sums over simple state features. Atomic features consider the value of a single state variable in a factored state representation, while binary features consider joint assignments to two state variables. Previous work showed that the set of all admissible and consistent potential heuristics using atomic features can be characterized by a compact set of linear constraints. We generalize this result to binary features and prove a hardness result for features of higher dimension. Furthermore, we prove a tractability result based on the treewidth of a new graphical structure we call the context-dependency graph . Finally, we study the relationship of potential heuristics to transition cost partitioning . Experimental results show that binary potential heuristics are significantly more informative than the previously considered atomic ones.


Temporal Planning with Intermediate Conditions and Effects

arXiv.org Artificial Intelligence

Automated temporal planning is the technology of choice when controlling systems that can execute more actions in parallel and when temporal constraints, such as deadlines, are needed in the model. One limitation of several action-based planning systems is that actions are modeled as intervals having conditions and effects only at the extremes and as invariants, but no conditions nor effects can be specified at arbitrary points or sub-intervals. In this paper, we address this limitation by providing an effective heuristic-search technique for temporal planning, allowing the definition of actions with conditions and effects at any arbitrary time within the action duration. We experimentally demonstrate that our approach is far better than standard encodings in PDDL 2.1 and is competitive with other approaches that can (directly or indirectly) represent intermediate action conditions or effects.


Active Goal Recognition

arXiv.org Artificial Intelligence

To coordinate with other systems, agents must be able to determine what the systems are currently doing and predict what they will be doing in the future---plan and goal recognition. There are many methods for plan and goal recognition, but they assume a passive observer that continually monitors the target system. Real-world domains, where information gathering has a cost (e.g., moving a camera or a robot, or time taken away from another task), will often require a more active observer. We propose to combine goal recognition with other observer tasks in order to obtain \emph{active goal recognition} (AGR). We discuss this problem and provide a model and preliminary experimental results for one form of this composite problem. As expected, the results show that optimal behavior in AGR problems balance information gathering with other actions (e.g., task completion) such as to achieve all tasks jointly and efficiently. We hope that our formulation opens the door for extensive further research on this interesting and realistic problem.


Reconnaissance and Planning algorithm for constrained MDP

arXiv.org Machine Learning

Practical reinforcement learning problems are often formulated as constrained Markov decision process (CMDP) problems, in which the agent has to maximize the expected return while satisfying a set of prescribed safety constraints. In this study, we propose a novel simulator-based method to approximately solve a CMDP problem without making any compromise on the safety constraints. We achieve this by decomposing the CMDP into a pair of MDPs; reconnaissance MDP and planning MDP. The purpose of reconnaissance MDP is to evaluate the set of actions that are safe, and the purpose of planning MDP is to maximize the return while using the actions authorized by reconnaissance MDP. RMDP can define a set of safe policies for any given set of safety constraint, and this set of safe policies can be used to solve another CMDP problem with different reward. Our method is not only computationally less demanding than the previous simulator-based approaches to CMDP, but also capable of finding a competitive reward-seeking policy in a high dimensional environment, including those involving multiple moving obstacles.


New chatbot provides smoother, unified travel planning - Springwise

#artificialintelligence

Spotted: Eddy Travels is an AI-enabled personal travel assistant that operates within popular chat applications, such as WhatsApp, Facebook Messenger, Viber, Slack and Telegram. Based on the user's chat conversations, Eddy uses a language processing system that makes tailored travel recommendations. This enables to unify all booking needs to one place, from flights to hotels. Eddy even recommends personalized activities. For example, if your friend mentions taking a trip to Tanzania, Eddy could recommend a safe area to stay in, accommodation, the best travel insurance, tours, etc.


New chatbot provides smoother, unified travel planning - Springwise

#artificialintelligence

Spotted: Eddy Travels is an AI-enabled personal travel assistant that operates within popular chat applications, such as WhatsApp, Facebook Messenger, Viber, Slack and Telegram. Based on the user's chat conversations, Eddy uses a language processing system that makes tailored travel recommendations. This enables to unify all booking needs to one place, from flights to hotels. Eddy even recommends personalized activities. For example, if your friend mentions taking a trip to Tanzania, Eddy could recommend a safe area to stay in, accommodation, the best travel insurance, tours, etc.