Education
Online Passive-Aggressive Algorithms
Shalev-shwartz, Shai, Crammer, Koby, Dekel, Ofer, Singer, Yoram
We present a unified view for online classification, regression, and uniclass problems.This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. Theend result is new algorithms and accompanying loss bounds for the hinge-loss.
Unsupervised Context Sensitive Language Acquisition from a Large Corpus
Solan, Zach, Horn, David, Ruppin, Eytan, Edelman, Shimon
We describe a pattern acquisition algorithm that learns, in an unsupervised fashion,a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structuredknowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentencesas paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented bytrees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected toa standard test of English as a Second Language (ESL) proficiency. Theresults are encouraging: the model attains a level of performance consideredto be "intermediate" for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children.
Online Classification on a Budget
Crammer, Koby, Kandola, Jaz, Singer, Yoram
Online algorithms for classification often require vast amounts of memory andcomputation time when employed in conjunction with kernel functions. In this paper we describe and analyze a simple approach for an on-the-fly reduction of the number of past examples used for prediction. Experiments performed with real datasets show that using the proposed algorithmic approach with a single epoch is competitive with the support vectormachine (SVM) although the latter, being a batch algorithm, accesses each training example multiple times.
Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
Werfel, Justin, Xie, Xiaohui, Seung, H. S.
Gradient-following learning methods can encounter problems of implementation inmany applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; withsufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective.
Learning a Rare Event Detection Cascade by Direct Feature Selection
Wu, Jianxin, Rehg, James M., Mullin, Matthew D.
Face detection is a canonical example of a rare event detection problem, inwhich target patterns occur with much lower frequency than nontargets. Outof millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses therare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as online learning.
Online Learning of Non-stationary Sequences
Monteleoni, Claire, Jaakkola, Tommi S.
We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving aswitching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization forlearning the parameter that governs the switching dynamics. We demonstrate the new algorithm in the context of wireless networks.
Link Prediction in Relational Data
Taskar, Ben, Wong, Ming-fai, Abbeel, Pieter, Koller, Daphne
Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic modelover the entire link graph -- entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation.