Controlling Deliberation with the Success Probability in a Dynamic Environment

AAAI Conferences

Seiji Yamada Interdisciplinary Graduate School of Science and Engineering Tokyo Institute of Technology 4259 Nagatsuda, Midori-ku, Yokohama, Kanagawa 226, JAPAN Email: yamada p Abstract This paper describes a novel method to interleave planning with execution in a dynamic environment. Though, in such planning, it is very important to control deliberation: to determine the timing for interleaving them, few research has been done. To cope with this problem, we propose a method to determine the interleave timing with the success probability, SP, that a plan will be successfully executed in an environment. We also developed a method to compute it efficiently with Bayesian networks and implemented SZ system. The system stops planning when the locally optimal plan's SP falls below an execution threshold, and executes the plan. Since SP depends on dynamics of an environment, a system does reactive behavior in a very dynamic environment, and becomes deliberative in a static one. We made experiments in Tileworld by changing dynamics and observation costs. As a result, we found the optimal threshold between reactivity and deliberation in some problem classes. Furthermore we found out the optimal threshold is robust against the change of dynamics and observation cost, and one of the classes in which S2"P works well is that the dynamics itself changes.

A Probabilistic Calculus of Actions

AAAI Conferences

In planning, however, they are less popular, 1 partly due to the unsettled, strange relationship between probability and actions. In principle, actions are not part of standard probability theory, and understandably so: probabilities capture normal relationships in the world, while actions represent interventions that perturb those relationships. It is no wonder, then, that actions are treated as foreign entities throughout the literature on probability and statistics; they serve neither as arguments of probability expressions nor as events for conditioning such expressions. Even in the decision theoretic literature, where actions are the target of op-1Works by Dean & Kanazawa [1989] and Kushmerick et al. [1993] notwithstanding.

Execution Monitoring with Quantitative Temporal Bayesian Networks

AAAI Conferences

The goal of execution monitoring is to determine whether a system or person is following a plan appropriately. Monitoring information may be uncertain, and the plan being monitored may have complex temporal constraints. We develop a new framework for reasoning under uncertainty with quantitative temporal constraints - Quantitative Temporal Bayesian Networks - and we discuss its application to plan-execution monitoring. QTBNs extend the major previous approaches to temporal reasoning under uncertainty: Time Nets (Kanazawa 1991), Dynamic Bayesian Networks and Dynamic Object Oriented Bayesian Networks (Friedman, Koller, & Pfeffer 1998). We argue that Time Nets can model quantitative temporal relationships but cannot easily model the changing values of fluents, while DBNs and DOOBNs naturally model fluents, but not quantitative temporal relationships. Both capabilities are required for execution monitoring, and are supported by QTBNs.

Asymptotic Bayesian Generalization Error in a General Stochastic Matrix Factorization for Markov Chain and Bayesian Network Machine Learning

Stochastic matrix factorization (SMF) can be regarded as a restriction of non-negative matrix factorization (NMF). SMF is useful for inference of topic models, NMF for binary matrices data, Markov chains, and Bayesian networks. However, SMF needs strong assumptions to reach a unique factorization and its theoretical prediction accuracy has not yet been clarified. In this paper, we study the maximum the pole of zeta function (real log canonical threshold) of a general SMF and derive an upper bound of the generalization error in Bayesian inference. The results give a foundation for a widely applicable and rigorous factorization method of SMF and mean that the generalization error in SMF becomes smaller than regular statistical models by Bayesian inference.

Max-norm Projections for Factored MDPs

AAAI Conferences

In the MDP framework, the system is modeled via a set of states which evolve stochastically. The key problem with this representation is that, in virtually any real-life domain, the state space is quite large. However, many large MDPs have significant internal structure, and can be modeled compactly if the structure is exploited in the representation. Factored MDPs [Boutilier et al. 1999] are one approach to representing large structured MDPs compactly. In this framework, a state is implicitly described by an assignment to some set of state variables. A dynamic Bayesian network (DBN)[Dean and Kanazawa 1989] can then allow a compact representation of the transition model, by exploiting the fact that the transition of a variable often depends only on a small number of other variables.