Goto

Collaborating Authors

 Technology


Conformant Planning via Symbolic Model Checking

Journal of Artificial Intelligence Research

We tackle the problem of planning in nondeterministic domains, by presenting a new approach to conformant planning. Conformant planning is the problem of finding a sequence of actions that is guaranteed to achieve the goal despite the nondeterminism of the domain. Our approach is based on the representation of the planning domain as a finite state automaton. We use Symbolic Model Checking techniques, in particular Binary Decision Diagrams, to compactly represent and efficiently search the automaton. In this paper we make the following contributions. First, we present a general planning algorithm for conformant planning, which applies to fully nondeterministic domains, with uncertainty in the initial condition and in action effects. The algorithm is based on a breadth-first, backward search, and returns conformant plans of minimal length, if a solution to the planning problem exists, otherwise it terminates concluding that the problem admits no conformant solution. Second, we provide a symbolic representation of the search space based on Binary Decision Diagrams (BDDs), which is the basis for search techniques derived from symbolic model checking. The symbolic representation makes it possible to analyze potentially large sets of states and transitions in a single computation step, thus providing for an efficient implementation. Third, we present CMBP (Conformant Model Based Planner), an efficient implementation of the data structures and algorithm described above, directly based on BDD manipulations, which allows for a compact representation of the search layers and an efficient implementation of the search steps. Finally, we present an experimental comparison of our approach with the state-of-the-art conformant planners CGP, QBFPLAN and GPT. Our analysis includes all the planning problems from the distribution packages of these systems, plus other problems defined to stress a number of specific factors. Our approach appears to be the most effective: CMBP is strictly more expressive than QBFPLAN and CGP and, in all the problems where a comparison is possible, CMBP outperforms its competitors, sometimes by orders of magnitude.


Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition

Journal of Artificial Intelligence Research

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consistent with the given hierarchy. The decomposition also creates opportunities to exploit state abstractions, so that individual MDPs within the hierarchy can ignore large parts of the state space. This is important for the practical application of the method. This paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this non-hierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.


OBDD-based Universal Planning for Synchronized Agents in Non-Deterministic Domains

Journal of Artificial Intelligence Research

Recently model checking representation and search techniques were shown to be efficiently applicable to planning, in particular to non-deterministic planning. Such planning approaches use Ordered Binary Decision Diagrams (OBDDs) to encode a planning domain as a non-deterministic finite automaton and then apply fast algorithms from model checking to search for a solution. OBDDs can effectively scale and can provide universal plans for complex planning domains. We are particularly interested in addressing the complexities arising in non-deterministic, multi-agent domains. In this article, we present UMOP, a new universal OBDD-based planning framework for non-deterministic, multi-agent domains. We introduce a new planning domain description language, NADL, to specify non-deterministic, multi-agent domains. The language contributes the explicit definition of controllable agents and uncontrollable environment agents. We describe the syntax and semantics of NADL and show how to build an efficient OBDD-based representation of an NADL description. The UMOP planning system uses NADL and different OBDD-based universal planning algorithms. It includes the previously developed strong and strong cyclic planning algorithms. In addition, we introduce our new optimistic planning algorithm that relaxes optimality guarantees and generates plausible universal plans in some domains where no strong nor strong cyclic solution exists. We present empirical results applying UMOP to domains ranging from deterministic and single-agent with no environment actions to non-deterministic and multi-agent with complex environment actions. UMOP is shown to be a rich and efficient planning system.


AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks

Journal of Artificial Intelligence Research

Stochastic sampling algorithms, while an attractive alternative to exact algorithms in very large Bayesian network models, have been observed to perform poorly in evidential reasoning with extremely unlikely evidence. To address this problem, we propose an adaptive importance sampling algorithm, AIS-BN, that shows promising convergence rates even under extreme conditions and seems to outperform the existing sampling algorithms consistently. Three sources of this performance improvement are (1) two heuristics for initialization of the importance function that are based on the theoretical properties of importance sampling in finite-dimensional integrals and the structural advantages of Bayesian networks, (2) a smooth learning method for the importance function, and (3) a dynamic weighting function for combining samples from different stages of the algorithm. We tested the performance of the AIS-BN algorithm along with two state of the art general purpose sampling algorithms, likelihood weighting (Fung & Chang, 1989; Shachter & Peot, 1989) and self-importance sampling (Shachter & Peot, 1989). We used in our tests three large real Bayesian network models available to the scientific community: the CPCS network (Pradhan et al., 1994), the PathFinder network (Heckerman, Horvitz, & Nathwani, 1990), and the ANDES network (Conati, Gertner, VanLehn, & Druzdzel, 1997), with evidence as unlikely as 10^-41. While the AIS-BN algorithm always performed better than the other two algorithms, in the majority of the test cases it achieved orders of magnitude improvement in precision of the results. Improvement in speed given a desired precision is even more dramatic, although we are unable to report numerical results here, as the other algorithms almost never achieved the precision reached even by the first few iterations of the AIS-BN algorithm.


Using Reactive and Adaptive Behaviors to Play Soccer

AI Magazine

This work deals with designing simple behaviors to allow quadruped robots to play soccer. In addition to vision problems such as changing lighting conditions and color confusion, legged robots must cope with "bouncing images" because of successive legs hitting the ground. Because it is not always possible to simulate the problems encountered in real situations, the behavior strategy should anticipate them. Experiments were carried out at the 1999 RoboCup in Stockholm using the Sony quadruped robots (Fujita 2000).


Building Intelligent Learning Database Systems

AI Magazine

Induction extracts knowledge in the form of, say, rules or decision trees from existing data, and deduction applies induction results to interpret new data. It starts with existing database technology and performs both induction and deduction. The integration of database technology, induction (from machine learning), and deduction (from knowledge-based sys-tems) plays a key role in the construction of ILDB systems, as does the design of efficient induction and deduction algorithms. This article presents a system structure for ILDB systems and discusses practical issues for ILDB applications, such as instance selection and structured induction.


The AAAI 1999 Mobile Robot Competitions and Exhibitions

AI Magazine

The Eighth Annual Mobile Robot Competition and Exhibition was held as part of the Sixteenth National Conference on Artificial Intelligence in Orlando, Florida, 18 to 22 July. The goals of these robot events are to foster the sharing of research and technology, allow research groups to showcase their achievements, encourage students to enter robotics and AI fields at both the undergraduate and graduate level, and increase awareness of the field. The 1999 events included two robot contests; a new, long-term robot challenge; an exhibition; and a National Botball Championship for high school teams sponsored by the KISS Institute. Each of these events is described in detail in this article.


Agent Assistants for Team Analysis

AI Magazine

With the growing importance of multiagent team-work, tools that can help humans analyze, evaluate, and understand team behaviors are also becoming increasingly important. ISAAC'S novelty stems from a key design constraint that arises in team analysis: Multiple types of models of team behavior are necessary to analyze different granularities of team events, including agent actions, interactions, and global performance. Additionally, ISAAC uses multiple presentation techniques that can aid human understanding of the analyses. This article presents ISAAC'S general conceptual framework and its application in the RoboCup soccer domain, where ISAAC was awarded the RoboCup Scientific Challenge Award.



Arvand: A Soccer Player Robot

AI Magazine

This robot consists of a moving mechanism, motion-control hardware, software, and a wireless communication system. The motion mechanism consists of a drive unit, a steer unit, and a castor wheel. Motion control is carried out by a special control board that uses two microcontrollers to carry out the software system decisions and transfers them to the robot mechanics. The software system performs real-time object recognition at the rate of 16 frames a second.