TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search

arXiv.org Artificial Intelligence

In this paper we present TDLeaf(lambda), a variation on the TD(lambda) algorithm that enables it to be used in conjunction with minimax search. We present some experiments in both chess and backgammon which demonstrate its utility and provide comparisons with TD(lambda) and another less radical variant, TD-directed(lambda). In particular, our chess program, ``KnightCap,'' used TDLeaf(lambda) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, fics.onenet.net). It improved from a 1650 rating to a 2100 rating in just 308 games. We discuss some of the reasons for this success and the relationship between our results and Tesauro's results in backgammon.


Variational Probabilistic Inference and the QMR-DT Network

Journal of Artificial Intelligence Research

We describe a variational approximation method for efficient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnostic inference in the `Quick Medical Reference' (QMR) network. The QMR network is a large-scale probabilistic graphical model built on statistical and expert knowledge. Exact probabilistic inference is infeasible in this model for all but a small set of cases. We evaluate our variational inference algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method.


Issues in Stacked Generalization

Journal of Artificial Intelligence Research

Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones. We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks. We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging.


Learning to Order Things

Journal of Artificial Intelligence Research

A T&T L abs, Shannon L ab or atory, 180 Park A venue Florham Park, NJ 07932, USA Abstract There are man y applications in whic h it is desirable to order rather than classify instances. Here w e consider the problem of learning ho w to order instances giv en feedbac k in the form of preference judgmen ts, i.e., statemen ts to the e ect that one instance should b e rank ed ahead of another. W e outline a t w o-stage approac h in whic h one rst learns b y con v en tional means a binary pr efer enc e function indicating whether it is advisable to rank one instance b efore another. Here w e consider an online algorithm for learning preference functions that is based on F reund and Sc hapire's \Hedge " algorithm. In the second stage, new instances are ordered so as to maximize agreemen t with the learned preference function. W e sho w that the problem of nding the ordering that agrees b est with a learned preference function is NPcomplete. Nev ertheless, w e describ e simple greedy algorithms that are guaran teed to nd a go o d appro ximation. Finally,w e sho wh o w metasearc h can b e form ulated as an ordering problem, and presen t exp erimen tal results on learning a combination of \searc h exp erts," eac h of whic h is a domain-sp eci c query expansion strategy f o raw eb searc h engine. Ho w ev er, there are man y applications in whic h it is desirable to order rather than classify instances. An example migh t be a p ersonalized email lter that prioritizes unread mail. Here w e will consider the problem of learning ho w to construct suc h orderings giv en feedbac k in the form of pr efer enc e judgments, i.e., statemen ts that one instance should b e rank ed ahead of another. Suc h orderings could b e constructed based on a learned probabilistic classi er or regression mo del and in fact often are. F or instance, it is common practice in information retriev al to rank do cumen ts according to their probabilit y of relev ance to a query, as estimated b ya learned classi er for the concept \relev an t do cumen t." An adv an tage of learning orderings directly is that preference judgmen ts can b e m uc h easier to obtain than the lab els required for classi cation learning. F or instance, in the email application men tioned ab o v e, one approac h migh t b e to rank messages according to their estimated probabilit yo f mem b ership in the class of \urgen t" messages, or b y some n umerical estimate of urgency obtained b y regression. Supp ose, ho w ev er, that a user is presen ted with an ordered list of email messages, and elects to read the third message rst. Giv en this election, it is not necessarily the case that message three is urgen t, nor is there suÆcien t information to estimate an y n umerical urgency measures.


Probabilistic Deduction with Conditional Constraints over Basic Events

Journal of Artificial Intelligence Research

We study the problem of probabilistic deduction with conditional constraints over basic events. We show that globally complete probabilistic deduction with conditional constraints over basic events is NP-hard. We then concentrate on the special case of probabilistic deduction in conditional constraint trees. We elaborate very efficient techniques for globally complete probabilistic deduction. In detail, for conditional constraint trees with point probabilities, we present a local approach to globally complete probabilistic deduction, which runs in linear time in the size of the conditional constraint trees. For conditional constraint trees with interval probabilities, we show that globally complete probabilistic deduction can be done in a global approach by solving nonlinear programs. We show how these nonlinear programs can be transformed into equivalent linear programs, which are solvable in polynomial time in the size of the conditional constraint trees.


Cooperation between Top-Down and Bottom-Up Theorem Provers

Journal of Artificial Intelligence Research

Top-down and bottom-up theorem proving approaches each have specific advantages and disadvantages. Bottom-up provers profit from strong redundancy control but suffer from the lack of goal-orientation, whereas top-down provers are goal-oriented but often have weak calculi when their proof lengths are considered. In order to integrate both approaches, we try to achieve cooperation between a top-down and a bottom-up prover in two different ways: The first technique aims at supporting a bottom-up with a top-down prover. A top-down prover generates subgoal clauses, they are then processed by a bottom-up prover. The second technique deals with the use of bottom-up generated lemmas in a top-down prover. We apply our concept to the areas of model elimination and superposition. We discuss the ability of our techniques to shorten proofs as well as to reorder the search space in an appropriate manner. Furthermore, in order to identify subgoal clauses and lemmas which are actually relevant for the proof task, we develop methods for a relevancy-based filtering. Experiments with the provers SETHEO and SPASS performed in the problem library TPTP reveal the high potential of our cooperation approaches.


Modeling Belief in Dynamic Systems, Part II: Revision and Update

Journal of Artificial Intelligence Research

The study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. In a companion paper (Friedman & Halpern, 1997), we introduce a new framework to model belief change. This framework combines temporal and epistemic modalities with a notion of plausibility, allowing us to examine the change of beliefs over time. In this paper, we show how belief revision and belief update can be captured in our framework. This allows us to compare the assumptions made by each method, and to better understand the principles underlying them. In particular, it shows that Katsuno and Mendelzon's notion of belief update (Katsuno & Mendelzon, 1991a) depends on several strong assumptions that may limit its applicability in artificial intelligence. Finally, our analysis allow us to identify a notion of minimal change that underlies a broad range of belief change operations including revision and update.


Efficient Implementation of the Plan Graph in STAN

Journal of Artificial Intelligence Research

STAN is a Graphplan-based planner, so-called because it uses a variety of STate ANalysis techniques to enhance its performance. STAN competed in the AIPS-98 planning competition where it compared well with the other competitors in terms of speed, finding solutions fastest to many of the problems posed. Although the domain analysis techniques STAN exploits are an important factor in its overall performance, we believe that the speed at which STAN solved the competition problems is largely due to the implementation of its plan graph. The implementation is based on two insights: that many of the graph construction operations can be implemented as bit-level logical operations on bit vectors, and that the graph should not be explicitly constructed beyond the fix point. This paper describes the implementation of STAN's plan graph and provides experimental results which demonstrate the circumstances under which advantages can be obtained from using this implementation.


A Counter Example to Theorems of Cox and Fine

Journal of Artificial Intelligence Research

Cox's well-known theorem justifying the use of probability is shown not to hold in finite domains. The counterexample also suggests that Cox's assumptions are insufficient to prove the result even in infinite domains. The same counterexample is used to disprove a result of Fine on comparative conditional probability.


Solving Highly Constrained Search Problems with Quantum Computers

Journal of Artificial Intelligence Research

A previously developed quantum search algorithm for solving 1-SAT problems in a single step is generalized to apply to a range of highly constrained k-SAT problems. We identify a bound on the number of clauses in satisfiability problems for which the generalized algorithm can find a solution in a constant number of steps as the number of variables increases. This performance contrasts with the linear growth in the number of steps required by the best classical algorithms, and the exponential number required by classical and quantum methods that ignore the problem structure. In some cases, the algorithm can also guarantee that insoluble problems in fact have no solutions, unlike previously proposed quantum search algorithms.