AITopics

doi: 10.1613/jair.1199

1106.4557

Country: North America > United States > California (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

arXiv.org Artificial IntelligenceJun-9-2011

Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System

Gorin, A., Langkilde-Geary, I., Walker, M. A., Wright, J., Hastie, H. Wright

Spoken dialogue systems promise efficient and natural access to a large variety of information sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the 'How May I Help You' (SM) spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automatically-obtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline.

artificial intelligence, machine learning, natural language, (17 more...)

doi: 10.1613/jair.971

1106.1817

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P.

SMOTE: Synthetic Minority Over-sampling Technique

arXiv.org Artificial IntelligenceJun-9-2011

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

artificial intelligence, dataset, machine learning, (16 more...)

doi: 10.1613/jair.953

1106.1813

Country:

North America > United States > California (1.00)
Europe (0.93)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.69)
Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Bolstad, Andrew, Van Veen, Barry, Nowak, Robert

Causal Network Inference via Group Sparse Regularization

arXiv.org Machine LearningJun-3-2011

This paper addresses the problem of inferring sparse causal networks modeled by multivariate auto-regressive (MAR) processes. Conditions are derived under which the Group Lasso (gLasso) procedure consistently estimates sparse network structure. The key condition involves a "false connection score." In particular, we show that consistent recovery is possible even when the number of observations of the network is far less than the number of parameters describing the network, provided that the false connection score is less than one. The false connection score is also demonstrated to be a useful metric of recovery in non-asymptotic regimes. The conditions suggest a modified gLasso procedure which tends to improve the false connection score and reduce the chances of reversing the direction of causal influence. Computational experiments and a real network based electrocorticogram (ECoG) simulation study demonstrate the effectiveness of the approach.

artificial intelligence, machine learning, node, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2011.2129515

1106.0762

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Brodley, C. E., Friedl, M. A.

Identifying Mislabeled Training Data

arXiv.org Artificial IntelligenceJun-1-2011

The goal of this approach is to improve classication accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classiers that serve as noise lters for the training data. We evaluate single algorithm, majority vote and consensus lters on ve datasets that are prone to labeling errors. Our experiments illustrate that ltering signicantly improves classication accuracy for noise levels up to 30%. An analytical and empirical evaluation of the precision of our approach shows that consensus lters are conservative at throwing away good data at the expense of retaining bad data and that majority lters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus lters are preferable, whereas majority vote lters are preferable for situations with an abundance of data. 1. Introducti The maximum accuracy achievable depends on the quality of the data and on the appropriateness of the chosen learning algorithm for the data. The work described here focuses on improving the quality of training data by identifying and eliminating mislabeled instances prior to applying the chosen learning algorithm, thereby increasing classication accuracy. Labeling error can occur for several reasons including subjectivity, data-entry error, or inadequacy of the information used to label each object. Subjectivity may arise when observations need to be ranked in some way such as disease severity or when the information used to label an object is dierent from the information to which the learning algorithm will have access. For example, when labeling pixels in image data, the analyst typically uses visual input rather than the numeric values of the feature vector corresponding to the observation. Domains in which experts disagree are natural places for subjective labeling errors (Smyth, 1996). A third cause of labeling error arises when the information used to label each observation is inadequate. For example, in the medical domain it may not be possible to perform the tests necessary to guarantee that a diagnosis is 100% accurate. For domains in which labeling errors occur, an automated method of eliminating or correcting mislabeled observations will improve the predictive accuracy of the classier formed from the training data. In this article we address the problem of identifying training instances that are mislabeled.

artificial intelligence, classier, machine learning, (18 more...)

doi: 10.1613/jair.606

1106.0219

Country: North America > United States > Massachusetts (0.46)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Energy (0.68)
Education (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Ting, K. M., Witten, I. H.

Issues in Stacked Generalization

arXiv.org Artificial IntelligenceMay-26-2011

Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones. We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks. We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging.

artificial intelligence, decision tree learning, machine learning, (18 more...)

doi: 10.1613/jair.594

1105.5466

Country: North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Velez, Javier (Massachusetts Institute of Technology) | Hemann, Garrett (Massachusetts Institute of Technology) | Huang, Albert S. (Massachusetts Institute of Technology) | Posner, Ingmar (Department of Engineering Science, University of Oxford) | Roy, Nicholas (Massachusetts Institute of Technology)

Planning to Perceive: Exploiting Mobility for Robust Object Detection

Consider the task of a mobile robot autonomously navigating through an environment while detecting and mapping objects of interest using a noisy object detector. The robot must reach its destination in a timely manner, but is rewarded for correctly detecting recognizable objects to be added to the map, and penalized for false alarms. However, detector performance typically varies with vantage point, so the robot benefits from planning trajectories which maximize the efficacy of the recognition system. This work describes an online, any-time planning framework enabling the active exploration of possible detections provided by an off-the-shelf object detector. We present a probabilistic approach where vantage points are identified which provide a more informative view of a potential object. The agent then weighs the benefit of increasing its confidence against the cost of taking a detour to reach each identified vantage point. The system is demonstrated to significantly improve detection and trajectory length in both simulated and real robot experiments.

detection, robot, trajectory, (17 more...)

Twenty-First International Conference on Automated Planning and Scheduling

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Robustness of Filter-Based Feature Ranking: A Case Study

Altidor, Wilker (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Hulse, Jason Van (Florida Atlantic University)

The filter model of feature selection has been well studied. In previous studies, classification performance has traditionally been proposed as a way to evaluate filter solutions. In this study, a new method of comparing feature ranking techniques is presented providing a straightforward approach for quantifying individual filters’ robustness to class noise. Six commonly-used filters, plus one which is rarely used, are investigated regarding their ability to retain, in the presence of class noise, strong classification performance. Three classifiers and one classification performance metric are considered. The experimental results of this study show that Gain Ratio, one of the well known and widely used filters, is very sensitive to class noise. ReliefF offers the best results with both the NB and kNN learners while Signal-to-noise, though not as widely used in the literature as the others, outperforms all the filters with the SVM learner.

class noise, classification performance, noise, (14 more...)

Twenty-Fourth International FLAIRS Conference

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Mining Chat Conversations: The Next Frontier

Analyzing chat traffic has important applications for both the military and the civilian world. This poster will report on an effort to automatically separate chat messages into topic threads.

chat message, classification, mining chat conversation, (16 more...)

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > New York (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.33)

Villena-Román, Julio (Universidad Carlos III de Madrid) | Collada-Pérez, Sonia (Daedalus - Data, Decisions and Language, S.A.) | Lana-Serrano, Sara (Universidad Politécnica de Madrid) | González-Cristóbal, José Carlos (Universidad Politécnica de Madrid)

Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization

This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to train.

categorization, category, classifier, (15 more...)

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Media > News (0.68)
Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)