AITopics

The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, a classifier is trained using human labeled images as ground truth. We present precision-recall curves showing that the resulting detector outperforms existing approaches.

artificial intelligence, boundary, machine learning, (16 more...)

Country: Asia (0.28)

Genre: Research Report (0.31)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Li, Stan Z., Zhang, Zhenqiu, Shum, Heung-yeung, Zhang, Hongjiang

FloatBoost Learning for Classification

AdaBoost [3] minimizes an upper error bound which is an exponential function of the margin on the training set [14]. However, the ultimate goal in applications of pattern classification is always minimum error rate. On the other hand, AdaBoost needs an effective procedure for learning weak classifiers, which by itself is difficult especially for high dimensional data. In this paper, we present a novel procedure, called FloatBoost, for learning a better boosted classifier. FloatBoost uses a backtrack mechanism after each iteration of AdaBoost to remove weak classifiers which cause higher error rates. The resulting float-boosted classifier consists of fewer weak classifiers yet achieves lower error rates than AdaBoost in both training and test. We also propose a statistical model for learning weak classifiers, based on a stagewise approximation of the posterior using an overcomplete set of scalar features. Experimental comparisonsof FloatBoost and AdaBoost are provided through a difficult classification problem, face detection, where the goal is to learn from training examples a highly nonlinear classifier to differentiate between faceand nonface patterns in a high dimensional space. The results clearly demonstrate the promises made by FloatBoost over AdaBoost.

artificial intelligence, classifier, machine learning, (17 more...)

Country:

Asia > China (0.16)
North America > United States (0.15)
North America > Canada (0.14)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.79)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.69)

Ghaoui, Laurent E., Jordan, Michael I., Lanckriet, Gert R.

Robust Novelty Detection with Single-Class MPM

This algorithm-the "single-class minimax probability machine(MPM)"- is built on a distribution-free methodology that minimizes the worst-case probability of a data point falling outside of a convex set, given only the mean and covariance matrix of the distribution and making no further distributional assumptions. Wepresent a robust approach to estimating the mean and covariance matrix within the general two-class MPM setting, and show how this approach specializes to the single-class problem. We provide empirical results comparing the single-class MPM to the single-class SVM and a two-class SVM method. 1 Introduction Novelty detection is an important unsupervised learning problem in which test data are to be judged as having been generated from the same or a different process as that which generated the training data.

covariance matrix, data mining, machine learning, (16 more...)

Country: North America > United States (0.28)

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Cortes, Corinna, Haffner, Patrick, Mohri, Mehryar

Rational Kernels

We introduce a general family of kernels based on weighted transducers orrational relations, rational kernels, that can be used for analysis of variable-length sequences or more generally weighted automata, in applications suchas computational biology or speech recognition. We show that rational kernels can be computed efficiently using a general algorithm ofcomposition of weighted transducers and a general single-source shortest-distance algorithm. We also describe several general families of positive definite symmetric rational kernels. These general kernels can be combined with Support Vector Machines to form efficient and powerful techniquesfor spoken-dialog classification: highly complex kernels become easy to design and implement and lead to substantial improvements inthe classification accuracy. We also show that the string kernels considered in applications to computational biology are all specific instances ofrational kernels.

artificial intelligence, machine learning, natural language, (20 more...)

Country:

Europe (0.94)
North America > United States > Massachusetts (0.28)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Weiss, G. M., Provost, F.

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

Journal of Artificial Intelligence ResearchOct-1-2003

For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best class distribution for learning. The naturally occurring class distribution is shown to generally perform well when classifier performance is evaluated using undifferentiated error rate (0/1 loss). However, when the area under the ROC curve is used to evaluate classifier performance, a balanced distribution is shown to perform well. Since neither of these choices for class distribution always generates the best-performing classifier, we introduce a "budget-sensitive" progressive sampling algorithm for selecting training examples based on the class associated with each example. An empirical analysis of this algorithm shows that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.

class distribution, error rate, minority-class example, (15 more...)

doi: 10.1613/jair.1199

AI Access Foundation

10346

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Lerman, K., Minton, S. N., Knoblock, C. A.

Wrapper Maintenance: A Machine Learning Approach

Journal of Artificial Intelligence ResearchFeb-1-2003

The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.

algorithm, information, wrapper, (16 more...)

doi: 10.1613/jair.1145

AI Access Foundation

10325

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Kings County > New York City (0.04)
(14 more...)

Genre:

Research Report (0.67)
Workflow (0.46)

Industry:

Consumer Products & Services > Restaurants (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.95)

Neural Information Processing SystemsDec-31-2002

Reducing multiclass to binary by coupling probability estimates

Zadrozny, B.

This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.

code matrix, matrix, probability estimate, (15 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Mozer, Michael C., Dodier, Robert, Colagrosso, Michael D., Guerra-Salcedo, Cesar, Wolniewicz, Richard

Prodding the ROC Curve: Constrained Optimization of Classifier Performance

Neural Information Processing SystemsDec-31-2002

When designing a two-alternative classifier, one ordinarily aims to maximize the classifier's ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: that the classifier achieve a specified correct acceptance or correct rejection rate (i.e., that it achieve a fixed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We propose four algorithms for training a classifier subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance.

algorithm, classifier, cr rate, (16 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Viola, Paul, Jones, Michael

Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade

Neural Information Processing SystemsDec-31-2002

This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g.

adaboost, classifier, detection rate, (13 more...)