Goto

Collaborating Authors

 Country


Domain Filtering Consistencies

arXiv.org Artificial Intelligence

Enforcing local consistencies is one of the main features of constraint reasoning. Which level of local consistency should be used when searching for solutions in a constraint network is a basic question. Arc consistency and partial forms of arc consistency have been widely studied, and have been known for sometime through the forward checking or the MAC search algorithms. Until recently, stronger forms of local consistency remained limited to those that change the structure of the constraint graph, and thus, could not be used in practice, especially on large networks. This paper focuses on the local consistencies that are stronger than arc consistency, without changing the structure of the network, i.e., only removing inconsistent values from the domains. In the last five years, several such local consistencies have been proposed by us or by others. We make an overview of all of them, and highlight some relations between them. We compare them both theoretically and experimentally, considering their pruning efficiency and the time required to enforce them.


GIB: Imperfect Information in a Computationally Challenging Game

arXiv.org Artificial Intelligence

This paper investigates the problems arising in the construction of a program to play the game of contract bridge. These problems include both the difficulty of solving the game's perfect information variant, and techniques needed to address the fact that bridge is not, in fact, a perfect information game. GIB, the program being described, involves five separate technical advances: partition search, the practical application of Monte Carlo techniques to realistic problems, a focus on achievable sets to solve problems inherent in the Monte Carlo approach, an extension of alpha-beta pruning from total orders to arbitrary distributive lattices, and the use of squeaky wheel optimization to find approximately optimal solutions to cardplay problems. GIB is currently believed to be of approximately expert caliber, and is currently the strongest computer bridge program in the world.


Experiments with Infinite-Horizon, Policy-Gradient Estimation

arXiv.org Artificial Intelligence

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter and Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter and Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.


Infinite-Horizon Policy-Gradient Estimation

arXiv.org Artificial Intelligence

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes POMDPs controlled by parameterized stochastic policies. A similar algorithm was proposed by (Kimura et al. 1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free beta (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter beta is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter et al., this volume) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward.


The Complexity of Reasoning about Spatial Congruence

arXiv.org Artificial Intelligence

In the recent literature of Artificial Intelligence, an intensive research effort has been spent, for various algebras of qualitative relations used in the representation of temporal and spatial knowledge, on the problem of classifying the computational complexity of reasoning problems for subsets of algebras. The main purpose of these researches is to describe a restricted set of maximal tractable subalgebras, ideally in an exhaustive fashion with respect to the hosting algebras. In this paper we introduce a novel algebra for reasoning about Spatial Congruence, show that the satisfiability problem in the spatial algebra MC-4 is NP-complete, and present a complete classification of tractability in the algebra, based on the individuation of three maximal tractable subclasses, one containing the basic relations. The three algebras are formed by 14, 10 and 9 relations out of 16 which form the full algebra.


The Impact of Mutation Rate on the Computation Time of Evolutionary Dynamic Optimization

arXiv.org Artificial Intelligence

Mutation has traditionally been regarded as an important operator in evolutionary algorithms. In particular, there have been many experimental studies which showed the effectiveness of adapting mutation rates for various static optimization problems. Given the perceived effectiveness of adaptive and self-adaptive mutation for static optimization problems, there have been speculations that adaptive and self-adaptive mutation can benefit dynamic optimization problems even more since adaptation and self-adaptation are capable of following a dynamic environment. However, few theoretical results are available in analyzing rigorously evolutionary algorithms for dynamic optimization problems. It is unclear when adaptive and self-adaptive mutation rates are likely to be useful for evolutionary algorithms in solving dynamic optimization problems. This paper provides the first rigorous analysis of adaptive mutation and its impact on the computation times of evolutionary algorithms in solving certain dynamic optimization problems. More specifically, for both individual-based and population-based EAs, we have shown that any time-variable mutation rate scheme will not significantly outperform a fixed mutation rate on some dynamic optimization problem instances. The proofs also offer some insights into conditions under which any time-variable mutation scheme is unlikely to be useful and into the relationships between the problem characteristics and algorithmic features (e.g., different mutation schemes).


Phase Transitions in Knowledge Compilation: an Experimental Study

arXiv.org Artificial Intelligence

Phase transitions in many complex combinational problems have been widely studied in the past decade. In this paper, we investigate phase transitions in the knowledge compilation empirically, where DFA, OBDD and d-DNNF are chosen as the target languages to compile random k-SAT instances. We perform intensive experiments to analyze the sizes of compilation results and draw the following conclusions: there exists an easy-hard-easy pattern in compilations; the peak point of sizes in the pattern is only related to the ratio of the number of clauses to that of variables when k is fixed, regardless of target languages; most sizes of compilation results increase exponentially with the number of variables growing, but there also exists a phase transition that separates a polynomial-increment region from the exponential-increment region; Moreover, we explain why the phase transition in compilations occurs by analyzing microstructures of DFAs, and conclude that a kind of solution interchangeability with more than 2 variables has a sharp transition near the peak point of the easy-hard-easy pattern, and thus it has a great impact on sizes of DFAs.


Learning unbelievable marginal probabilities

arXiv.org Artificial Intelligence

Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals `unbelievable.' This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals.


Analyzing Search Topology Without Running Any Search: On the Connection Between Causal Graphs and h+

Journal of Artificial Intelligence Research

The ignoring delete lists relaxation is of paramount importance for both satisficing and optimal planning. In earlier work, it was observed that the optimal relaxation heuristic h+ has amazing qualities in many classical planning benchmarks, in particular pertaining to the complete absence of local minima. The proofs of this are hand-made, raising the question whether such proofs can be lead automatically by domain analysis techniques. In contrast to earlier disappointing results -- the analysis method has exponential runtime and succeeds only in two extremely simple benchmark domains -- we herein answer this question in the affirmative. We establish connections between causal graph structure and h+ topology. This results in low-order polynomial time analysis methods, implemented in a tool we call TorchLight. Of the 12 domains where the absence of local minima has been proved, TorchLight gives strong success guarantees in 8 domains. Empirically, its analysis exhibits strong performance in a further 2 of these domains, plus in 4 more domains where local minima may exist but are rare. In this way, TorchLight can distinguish ``easy'' domains from ``hard'' ones. By summarizing structural reasons for analysis failure, TorchLight also provides diagnostic output indicating domain aspects that may cause local minima.


An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email

arXiv.org Artificial Intelligence

This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.