Popular Ensemble Methods: An Empirical Study

Journal of Artificial Intelligence Research

An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund & Shapire, 1996; Shapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier -- especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble's performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees.


Tractability of Theory Patching

Journal of Artificial Intelligence Research

In this paper we consider the problem of `theory patching', in which we are given a domain theory, some of whose components are indicated to be possibly flawed, and a set of labeled training examples for the domain concept. The theory patching problem is to revise only the indicated components of the theory, such that the resulting theory correctly classifies all the training examples. Theory patching is thus a type of theory revision in which revisions are made to individual components of the theory. Our concern in this paper is to determine for which classes of logical domain theories the theory patching problem is tractable. We consider both propositional and first-order domain theories, and show that the theory patching problem is equivalent to that of determining what information contained in a theory is `stable' regardless of what revisions might be performed to the theory. We show that determining stability is tractable if the input theory satisfies two conditions: that revisions to each theory component have monotonic effects on the classification of examples, and that theory components act independently in the classification of examples in the theory. We also show how the concepts introduced can be used to determine the soundness and completeness of particular theory patching algorithms.



Model-Based Diagnosis using Structured System Descriptions

Journal of Artificial Intelligence Research

This paper presents a comprehensive approach for model-based diagnosis which includes proposals for characterizing and computing preferred diagnoses, assuming that the system description is augmented with a system structure (a directed graph explicating the interconnections between system components). Specifically, we first introduce the notion of a consequence, which is a syntactically unconstrained propositional sentence that characterizes all consistency-based diagnoses and show that standard characterizations of diagnoses, such as minimal conflicts, correspond to syntactic variations on a consequence. Second, we propose a new syntactic variation on the consequence known as negation normal form (NNF) and discuss its merits compared to standard variations. Third, we introduce a basic algorithm for computing consequences in NNF given a structured system description. We show that if the system structure does not contain cycles, then there is always a linear-size consequence in NNF which can be computed in linear time. For arbitrary system structures, we show a precise connection between the complexity of computing consequences and the topology of the underlying system structure. Finally, we present an algorithm that enumerates the preferred diagnoses characterized by a consequence. The algorithm is shown to take linear time in the size of the consequence if the preference criterion satisfies some general conditions.


Linking Search Space Structure, Run-Time Dynamics, and Problem Difficulty: A Step Toward Demystifying Tabu Search

Journal of Artificial Intelligence Research

Tabu search is one of the most effective heuristics for locating high-quality solutions to a diverse array of NP-hard combinatorial optimization problems. Despite the widespread success of tabu search, researchers have a poor understanding of many key theoretical aspects of this algorithm, including models of the high-level run-time dynamics and identification of those search space features that influence problem difficulty. We consider these questions in the context of the job-shop scheduling problem (JSP), a domain where tabu search algorithms have been shown to be remarkably effective. Previously, we demonstrated that the mean distance between random local optima and the nearest optimal solution is highly correlated with problem difficulty for a well-known tabu search algorithm for the JSP introduced by Taillard. In this paper, we discuss various shortcomings of this measure and develop a new model of problem difficulty that corrects these deficiencies. We show that Taillard's algorithm can be modeled with high fidelity as a simple variant of a straightforward random walk. The random walk model accounts for nearly all of the variability in the cost required to locate both optimal and sub-optimal solutions to random JSPs, and provides an explanation for differences in the difficulty of random versus structured JSPs. Finally, we discuss and empirically substantiate two novel predictions regarding tabu search algorithm behavior. First, the method for constructing the initial solution is highly unlikely to impact the performance of tabu search. Second, tabu tenure should be selected to be as small as possible while simultaneously avoiding search stagnation; values larger than necessary lead to significant degradations in performance.