AITopics | Country

Collaborating Authors

Country

Identifying Mislabeled Training Data

arXiv.org Artificial IntelligenceJun-1-2011

This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data.

classier, inductive learning, us government, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.606

1106.0219

Country:

North America > United States > Massachusetts (0.46)
North America > United States > Indiana > Tippecanoe County (0.14)

Industry:

Energy (0.68)
Education (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

The Good Old Davis-Putnam Procedure Helps Counting Models

Birnbaum, E., Lozinskii, E. L.

arXiv.org Artificial IntelligenceJun-1-2011

As was shown recently, many important AI problems require counting the number of models of propositional formulas. The problem of counting models of such formulas is, according to present knowledge, computationally intractable in a worst case. Based on the Davis-Putnam procedure, we present an algorithm, CDP, that computes the exact number of models of a propositional CNF or DNF formula F. Let m and n be the number of clauses and variables of F, respectively, and let p denote the probability that a literal l of F occurs in a clause C of F, then the average running time of CDP is shown to be O(nm^d), where d=-1/log(1-p). The practical performance of CDP has been estimated in a series of experiments on a wide variety of CNF formulas.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.601

1106.0218

Country: Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email

Walker, M. A.

arXiv.org Artificial IntelligenceJun-1-2011

This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.

artificial intelligence, dialogue, télécommunications, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.713

1106.0241

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.14)
North America > United States > New Jersey (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Telecommunications (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

ProDiGe: PRioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

Mordelet, Fantine, Vert, Jean-Philippe

arXiv.org Machine LearningJun-1-2011

Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. Here we propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases.

epilepsy, health & medicine, information, (21 more...)

arXiv.org Machine Learning

1106.0134

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Ohio (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Genetic Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)

Add feedback

Activity-Based Search for Black-Box Contraint-Programming Solvers

Michel, L., Van Hentenryck, P.

arXiv.org Artificial IntelligenceMay-31-2011

Robust search procedures are a central component in the design of black-box constraint-programming solvers. This paper proposes activity-based search, the idea of using the activity of variables during propagation to guide the search. Activity-based search was compared experimentally to impact-based search and the WDEG heuristics. Experimental results on a variety of benchmarks show that activity-based search is more robust than other heuristics and may produce significant improvements in performance.

abs, air transportation, constraint-based reasoning, (17 more...)

arXiv.org Artificial Intelligence

1105.6314

Country:

North America > United States > Connecticut > Tolland County > Storrs (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Air (0.61)
Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)

Add feedback

Reasoning on Interval and Point-based Disjunctive Metric Constraints in Temporal Contexts

Barber, F.

arXiv.org Artificial IntelligenceMay-30-2011

We introduce a temporal model for reasoning on disjunctive metric constraints on intervals and time points in temporal contexts. This temporal model is composed of a labeled temporal algebra and its reasoning algorithms. The labeled temporal algebra defines labeled disjunctive metric point-based constraints, where each disjunct in each input disjunctive constraint is univocally associated to a label. Reasoning algorithms manage labeled constraints, associated label lists, and sets of mutually inconsistent disjuncts. These algorithms guarantee consistency and obtain a minimal network. Additionally, constraints can be organized in a hierarchy of alternative temporal contexts. Therefore, we can reason on context-dependent disjunctive metric constraints on intervals and points. Moreover, the model is able to represent non-binary constraints, such that logical dependencies on disjuncts in constraints can be handled. The computational cost of reasoning algorithms is exponential in accordance with the underlying problem complexity, although some improvements are proposed.

constraint, constraint-based reasoning, planning & scheduling, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.693

1105.6124

Country: Europe > Spain (0.27)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

Context models on sequences of covers

Dimitrakakis, Christos

arXiv.org Machine LearningMay-30-2011

We present a class of models that, via a simple construction, enables exact, incremental, non-parametric, polynomial-time, Bayesian inference of conditional measures. The approach relies upon creating a sequence of covers on the conditioning variable and maintaining a different model for each set within a cover. Inference remains tractable by specifying the probabilistic model in terms of a random walk within the sequence of covers. We demonstrate the approach on problems of conditional density estimation, which, to our knowledge is the first closed-form, non-parametric Bayesian approach to this problem.

artificial intelligence, bayesian inference, sequence, (15 more...)

arXiv.org Machine Learning

1005.2263

Country: Europe > Italy (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Complexity of and Algorithms for Borda Manipulation

Davies, Jessica, Katsirelos, George, Narodytska, Nina, Walsh, Toby

arXiv.org Artificial IntelligenceMay-27-2011

We prove that it is NP-hard for a coalition of two manipulators to compute how to manipulate the Borda voting rule. This resolves one of the last open problems in the computational complexity of manipulating common voting rules. Because of this NP-hardness, we treat computing a manipulation as an approximation problem where we try to minimize the number of manipulators. Based on ideas from bin packing and multiprocessor scheduling, we propose two new approximation methods to compute manipulations of the Borda rule. Experiments show that these methods significantly outperform the previous best known %existing approximation method. We are able to find optimal manipulations in almost all the randomly generated elections tested. Our results suggest that, whilst computing a manipulation of the Borda rule by a coalition is NP-hard, computational complexity may provide only a weak barrier against manipulation in practice.

artificial intelligence, manipulation, manipulator, (16 more...)

arXiv.org Artificial Intelligence

1105.5667

Country:

Oceania > Australia (0.28)
North America > Canada > Ontario > Toronto (0.14)

Industry:

Government > Voting & Elections (0.49)
Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

AntNet: Distributed Stigmergetic Control for Communications Networks

Di Caro, G., Dorigo, M.

arXiv.org Artificial IntelligenceMay-26-2011

This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurrently explore the network and exchange collected information. The communication among the agents is indirect and asynchronous, mediated by the network itself. This form of communication is typical of social insects and is called stigmergy. We compare our algorithm with six state-of-the-art routing algorithms coming from the telecommunications and machine learning fields. The algorithms' performance is evaluated over a set of realistic testbeds. We run many experiments over real and artificial IP datagram networks with increasing number of nodes and under several paradigmatic spatial and temporal traffic distributions. Results are very encouraging. AntNet showed superior performance under all the experimental conditions with respect to its competitors. We analyze the main characteristics of the algorithm and try to explain the reasons for its superiority.

algorithm, computer based training, educational technology, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.530

1105.5449

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry:

Telecommunications > Networks (1.00)
Energy (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Learning to Order Things

Cohen, W. W., Schapire, R. E., Singer, Y.

arXiv.org Artificial IntelligenceMay-26-2011

There are many applications in which it is desirable to order rather than classify instances. Here we consider the problem of learning how to order instances given feedback in the form of preference judgments, i.e., statements to the effect that one instance should be ranked ahead of another. We outline a two-stage approach in which one first learns by conventional means a binary preference function indicating whether it is advisable to rank one instance before another. Here we consider an on-line algorithm for learning preference functions that is based on Freund and Schapire's 'Hedge' algorithm. In the second stage, new instances are ordered so as to maximize agreement with the learned preference function. We show that the problem of finding the ordering that agrees best with a learned preference function is NP-complete. Nevertheless, we describe simple greedy algorithms that are guaranteed to find a good approximation. Finally, we show how metasearch can be formulated as an ordering problem, and present experimental results on learning a combination of 'search experts', each of which is a domain-specific query expansion strategy for a web search engine.

algorithm, artificial intelligence, information management, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.587

1105.5464

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management > Search (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback