AITopics

Country: North America > United States (0.28)

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Cortes, Corinna, Haffner, Patrick, Mohri, Mehryar

Rational Kernels

Neural Information Processing SystemsDec-31-2003

We introduce a general family of kernels based on weighted transducers orrational relations, rational kernels, that can be used for analysis of variable-length sequences or more generally weighted automata, in applications suchas computational biology or speech recognition. We show that rational kernels can be computed efficiently using a general algorithm ofcomposition of weighted transducers and a general single-source shortest-distance algorithm. We also describe several general families of positive definite symmetric rational kernels. These general kernels can be combined with Support Vector Machines to form efficient and powerful techniquesfor spoken-dialog classification: highly complex kernels become easy to design and implement and lead to substantial improvements inthe classification accuracy. We also show that the string kernels considered in applications to computational biology are all specific instances ofrational kernels.

artificial intelligence, machine learning, natural language, (20 more...)

Country:

Europe (0.94)
North America > United States > Massachusetts (0.28)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Weiss, G. M., Provost, F.

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

Journal of Artificial Intelligence ResearchOct-1-2003

For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best class distribution for learning. The naturally occurring class distribution is shown to generally perform well when classifier performance is evaluated using undifferentiated error rate (0/1 loss). However, when the area under the ROC curve is used to evaluate classifier performance, a balanced distribution is shown to perform well. Since neither of these choices for class distribution always generates the best-performing classifier, we introduce a "budget-sensitive" progressive sampling algorithm for selecting training examples based on the class associated with each example. An empirical analysis of this algorithm shows that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.

class distribution, error rate, minority-class example, (15 more...)

doi: 10.1613/jair.1199

AI Access Foundation

10346

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Lerman, K., Minton, S. N., Knoblock, C. A.

Wrapper Maintenance: A Machine Learning Approach

Journal of Artificial Intelligence ResearchFeb-1-2003

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.

algorithm, information, wrapper, (16 more...)

doi: 10.1613/jair.1145

AI Access Foundation

10325

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Kings County > New York City (0.04)
(14 more...)

Genre:

Research Report (0.67)
Workflow (0.46)

Industry:

Consumer Products & Services > Restaurants (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.95)

Reducing multiclass to binary by coupling probability estimates

Zadrozny, B.

This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.

code matrix, matrix, probability estimate, (15 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Mozer, Michael C., Dodier, Robert, Colagrosso, Michael D., Guerra-Salcedo, Cesar, Wolniewicz, Richard

Prodding the ROC Curve: Constrained Optimization of Classifier Performance

When designing a two-alternative classifier, one ordinarily aims to maximize the classifier's ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: that the classifier achieve a specified correct acceptance or correct rejection rate (i.e., that it achieve a fixed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We propose four algorithms for training a classifier subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance.

algorithm, classifier, cr rate, (16 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Viola, Paul, Jones, Michael

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g.

adaboost, classifier, detection rate, (13 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.51)

Reducing multiclass to binary by coupling probability estimates

Zadrozny, B.

This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.

code matrix, matrix, probability estimate, (15 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Dasgupta, Sanjoy, Littman, Michael L., McAllester, David A.

PAC Generalization Bounds for Co-training

In this paper, we study bootstrapping algorithms for learning from unlabeled data. The general idea in bootstrapping is to use some initial labeled data to build a (possibly partial) predictive labeling procedure; then use the labeling procedure to label more data; then use the newly labeled data to build a new predictive procedure and so on. This process can be iterated until a fixed point is reached or some other stopping criterion is met. Here we give P AC style bounds on generalization error which can be used to formally justify certain boostrapping algorithms. One well-known form of bootstrapping is the EM algorithm (Dempster, Laird and Rubin, 1977).

nullnullnull, nullnullnull and nullnullnull, nullnullnullnullnull, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Mozer, Michael C., Dodier, Robert, Colagrosso, Michael D., Guerra-Salcedo, Cesar, Wolniewicz, Richard

Prodding the ROC Curve: Constrained Optimization of Classifier Performance

When designing a two-alternative classifier, one ordinarily aims to maximize the classifier's ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: that the classifier achieve a specified correct acceptance or correct rejection rate (i.e., that it achieve a fixed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We propose four algorithms for training a classifier subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance.

algorithm, classifier, cr rate, (16 more...)