Goto

Collaborating Authors

 Country


Identifying Mislabeled Training Data

arXiv.org Artificial Intelligence

The goal of this approach is to improve classication accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classiers that serve as noise lters for the training data. We evaluate single algorithm, majority vote and consensus lters on ve datasets that are prone to labeling errors. Our experiments illustrate that ltering signicantly improves classication accuracy for noise levels up to 30%. An analytical and empirical evaluation of the precision of our approach shows that consensus lters are conservative at throwing away good data at the expense of retaining bad data and that majority lters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus lters are preferable, whereas majority vote lters are preferable for situations with an abundance of data. 1. Introducti The maximum accuracy achievable depends on the quality of the data and on the appropriateness of the chosen learning algorithm for the data. The work described here focuses on improving the quality of training data by identifying and eliminating mislabeled instances prior to applying the chosen learning algorithm, thereby increasing classication accuracy. Labeling error can occur for several reasons including subjectivity, data-entry error, or inadequacy of the information used to label each object. Subjectivity may arise when observations need to be ranked in some way such as disease severity or when the information used to label an object is dierent from the information to which the learning algorithm will have access. For example, when labeling pixels in image data, the analyst typically uses visual input rather than the numeric values of the feature vector corresponding to the observation. Domains in which experts disagree are natural places for subjective labeling errors (Smyth, 1996). A third cause of labeling error arises when the information used to label each observation is inadequate. For example, in the medical domain it may not be possible to perform the tests necessary to guarantee that a diagnosis is 100% accurate. For domains in which labeling errors occur, an automated method of eliminating or correcting mislabeled observations will improve the predictive accuracy of the classier formed from the training data. In this article we address the problem of identifying training instances that are mislabeled.


Automated Complexity Analysis Based on the Dependency Pair Method

arXiv.org Artificial Intelligence

This article is concerned with automated complexity analysis of term rewrite systems. Since these systems underlie much of declarative programming, time complexity of functions defined by rewrite systems is of particular interest. Among other results, we present a variant of the dependency pair method for analysing runtime complexities of term rewrite systems automatically. The established results significantly extent previously known techniques: we give examples of rewrite systems subject to our methods that could previously not been analysed automatically. Furthermore, the techniques have been implemented in the Tyrolean Complexity Tool. We provide ample numerical data for assessing the viability of the method.


Proposal of Pattern Recognition as a necessary and sufficient Principle to Cognitive Science

arXiv.org Artificial Intelligence

Despite the prevalence of the Computational Theory of Mind and the Connectionist Model, the establishing of the key principles of the Cognitive Science are still controversy and inconclusive. This paper proposes the concept of PATTERN RECOGNITION as NECESSARY AND SUFFICIENT PRINCIPLE for a general cognitive science modeling, in a very ambitious scientific proposal. A formal physical definition of the pattern recognition concept is also proposed to solve many key conceptual gaps on the field.


Activity-Based Search for Black-Box Contraint-Programming Solvers

arXiv.org Artificial Intelligence

Robust search procedures are a central component in the design of black-box constraint-programming solvers. This paper proposes activity-based search, the idea of using the activity of variables during propagation to guide the search. Activity-based search was compared experimentally to impact-based search and the wdeg heuristics. Experimental results on a variety of benchmarks show that activity-based search is more robust than other heuristics and may produce significant improvements in performance.


Reasoning on Interval and Point-based Disjunctive Metric Constraints in Temporal Contexts

arXiv.org Artificial Intelligence

We introduce a temporal model for reasoning on disjunctive metric constraints on intervals and time points in temporal contexts. This temporal model is composed of a labeled temporal algebra and its reasoning algorithms. The labeled temporal algebra defines labeled disjunctive metric point-based constraints, where each disjunct in each input disjunctive constraint is univocally associated to a label. Reasoning algorithms manage labeled constraints, associated label lists, and sets of mutually inconsistent disjuncts. These algorithms guarantee consistency and obtain a minimal network. Additionally, constraints can be organized in a hierarchy of alternative temporal contexts. Therefore, we can reason on context-dependent disjunctive metric constraints on intervals and points. Moreover, the model is able to represent non-binary constraints, such that logical dependencies on disjuncts in constraints can be handled. The computational cost of reasoning algorithms is exponential in accordance with the underlying problem complexity, although some improvements are proposed.


Redistribution Mechanisms for Assignment of Heterogeneous Objects

Journal of Artificial Intelligence Research

There are p heterogeneous objects to be assigned to n competing agents (n > p) each with unit demand. It is required to design a Groves mechanism for this assignment problem satisfying weak budget balance, individual rationality, and minimizing the budget imbalance. This calls for designing an appropriate rebate function. When the objects are identical, this problem has been solved which we refer as WCO mechanism. We measure the performance of such mechanisms by the redistribution index. We first prove an impossibility theorem which rules out linear rebate functions with non-zero redistribution index in heterogeneous object assignment. Motivated by this theorem, we explore two approaches to get around this impossibility. In the first approach, we show that linear rebate functions with non-zero redistribution index are possible when the valuations for the objects have a certain type of relationship and we design a mechanism with linear rebate function that is worst case optimal. In the second approach, we show that rebate functions with non-zero efficiency are possible if linearity is relaxed. We extend the rebate functions of the WCO mechanism to heterogeneous objects assignment and conjecture them to be worst case optimal.


Soft Constraints of Difference and Equality

Journal of Artificial Intelligence Research

In many combinatorial problems one may need to model the diversity or similarity of assignments in a solution. For example, one may wish to maximise or minimise the number of distinct values in a solution. To formulate problems of this type, we can use soft variants of the well known AllDifferent and AllEqual constraints. We present a taxonomy of six soft global constraints, generated by combining the two latter ones and the two standard cost functions, which are either maximised or minimised. We characterise the complexity of achieving arc and bounds consistency on these constraints, resolving those cases for which NP-hardness was neither proven nor disproven. In particular, we explore in depth the constraint ensuring that at least k pairs of variables have a common value. We show that achieving arc consistency is NP-hard, however achieving bounds consistency can be done in polynomial time through dynamic programming. Moreover, we show that the maximum number of pairs of equal variables can be approximated by a factor 1/2 with a linear time greedy algorithm. Finally, we provide a fixed parameter tractable algorithm with respect to the number of values appearing in more than two distinct domains. Interestingly, this taxonomy shows that enforcing equality is harder than enforcing difference.


Value of Information Lattice: Exploiting Probabilistic Independence for Effective Feature Subset Acquisition

Journal of Artificial Intelligence Research

We address the cost-sensitive feature acquisition problem, where misclassifying an instance is costly but the expected misclassification cost can be reduced by acquiring the values of the missing features. Because acquiring the features is costly as well, the objective is to acquire the right set of features so that the sum of the feature acquisition cost and misclassification cost is minimized. We describe the Value of Information Lattice (VOILA), an optimal and efficient feature subset acquisition framework. Unlike the common practice, which is to acquire features greedily, VOILA can reason with subsets of features. VOILA efficiently searches the space of possible feature subsets by discovering and exploiting conditional independence properties between the features and it reuses probabilistic inference computations to further speed up the process. Through empirical evaluation on five medical datasets, we show that the greedy strategy is often reluctant to acquire features, as it cannot forecast the benefit of acquiring multiple features in combination.


Complexity of and Algorithms for Borda Manipulation

arXiv.org Artificial Intelligence

We prove that it is NP-hard for a coalition of two manipulators to compute how to manipulate the Borda voting rule. This resolves one of the last open problems in the computational complexity of manipulating common voting rules. Because of this NP-hardness, we treat computing a manipulation as an approximation problem where we try to minimize the number of manipulators. Based on ideas from bin packing and multiprocessor scheduling, we propose two new approximation methods to compute manipulations of the Borda rule. Experiments show that these methods significantly outperform the previous best known %existing approximation method. We are able to find optimal manipulations in almost all the randomly generated elections tested. Our results suggest that, whilst computing a manipulation of the Borda rule by a coalition is NP-hard, computational complexity may provide only a weak barrier against manipulation in practice.


Squeaky Wheel Optimization

arXiv.org Artificial Intelligence

We describe a general approach to optimization which we term `Squeaky Wheel' Optimization (SWO). In SWO, a greedy algorithm is used to construct a solution which is then analyzed to find the trouble spots, i.e., those elements, that, if improved, are likely to improve the objective function score. The results of the analysis are used to generate new priorities that determine the order in which the greedy algorithm constructs the next solution. This Construct/Analyze/Prioritize cycle continues until some limit is reached, or an acceptable solution is found. SWO can be viewed as operating on two search spaces: solutions and prioritizations. Successive solutions are only indirectly related, via the re-prioritization that results from analyzing the prior solution. Similarly, successive prioritizations are generated by constructing and analyzing solutions. This `coupled search' has some interesting properties, which we discuss. We report encouraging experimental results on two domains, scheduling problems that arise in fiber-optic cable manufacturing, and graph coloring problems. The fact that these domains are very different supports our claim that SWO is a general technique for optimization.