AITopics

Online algorithms for classification often require vast amounts of memory and computation time when employed in conjunction with kernel functions. In this paper we describe and analyze a simple approach for an on-the-fly reduction of the number of past examples used for prediction. Experiments performed with real datasets show that using the proposed algorithmic approach with a single epoch is competitive with the support vector machine (SVM) although the latter, being a batch algorithm, accesses each training example multiple times.

algorithm, online algorithm, support pattern, (15 more...)

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Bottou, Léon, Cun, Yann L.

Large Scale Online Learning

We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented.

algorithm, convergence, iteration, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.63)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Wu, Jianxin, Rehg, James M., Mullin, Matthew D.

Learning a Rare Event Detection Cascade by Direct Feature Selection

Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as online learning.

algorithm, cascade, classifier, (13 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.69)

Industry:

Information Technology > Security & Privacy (0.34)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.84)

Shalev-shwartz, Shai, Crammer, Koby, Dekel, Ofer, Singer, Yoram

Online Passive-Aggressive Algorithms

We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hinge-loss.

algorithm, classification, online algorithm, (15 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Industry: Education > Educational Setting > Online (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Werfel, Justin, Xie, Xiaohui, Seung, H. S.

Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective.

algorithm, noise, perturbation, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Monteleoni, Claire, Jaakkola, Tommi S.

Online Learning of Non-stationary Sequences

We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization for learning the parameter that governs the switching dynamics. We demonstrate the new algorithm in the context of wireless networks.

algorithm, discretization, sequence, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications > Networks (0.89)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Solan, Zach, Horn, David, Ruppin, Eytan, Edelman, Shimon

We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be "intermediate" for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children.

algorithm, corpus, equivalence class, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Genre: Research Report (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Link Prediction in Relational Data

Taskar, Ben, Wong, Ming-fai, Abbeel, Pieter, Koller, Daphne

Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph -- entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation.

clique, relation, student, (15 more...)

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry: Education (0.93)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Carreras, Xavier, Màrquez, Lluís

Online Learning via Global Feedback for Phrase Recognition

This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently.

classifier, phrase candidate, score function, (15 more...)