AITopics | Education

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data - i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi -supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias.

dataset, learning, unlabeled data, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1509

AI Access Foundation

10404

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > District of Columbia > Washington (0.14)
(12 more...)

Genre: Research Report > Experimental Study (0.92)

Industry:

Banking & Finance (0.46)
Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.95)

Add feedback

Online Classification on a Budget

Crammer, Koby, Kandola, Jaz, Singer, Yoram

Neural Information Processing SystemsDec-31-2004

Online algorithms for classification often require vast amounts of memory and computation time when employed in conjunction with kernel functions. In this paper we describe and analyze a simple approach for an on-the-fly reduction of the number of past examples used for prediction. Experiments performed with real datasets show that using the proposed algorithmic approach with a single epoch is competitive with the support vector machine (SVM) although the latter, being a batch algorithm, accesses each training example multiple times.

algorithm, online algorithm, support pattern, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

Learning a Rare Event Detection Cascade by Direct Feature Selection

Wu, Jianxin, Rehg, James M., Mullin, Matthew D.

Neural Information Processing SystemsDec-31-2004

Face detection is a canonical example of a rare event detection problem, in which target patterns occur with much lower frequency than nontargets. Out of millions of face-sized windows in an input image, for example, only a few will typically contain a face. Viola and Jones recently proposed a cascade architecture for face detection which successfully addresses the rare event nature of the task. A central part of their method is a feature selection algorithm based on AdaBoost. We present a novel cascade learning algorithm based on forward feature selection which is two orders of magnitude faster than the Viola-Jones approach and yields classifiers of equivalent quality. This faster method could be used for more demanding classification tasks, such as online learning.

algorithm, cascade, classifier, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.69)

Industry:

Information Technology > Security & Privacy (0.34)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.84)

Add feedback

Online Passive-Aggressive Algorithms

Shalev-shwartz, Shai, Crammer, Koby, Dekel, Ofer, Singer, Yoram

Neural Information Processing SystemsDec-31-2004

We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hinge-loss.

algorithm, classification, online algorithm, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Industry: Education > Educational Setting > Online (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

Werfel, Justin, Xie, Xiaohui, Seung, H. S.

Neural Information Processing SystemsDec-31-2004

Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective.

algorithm, noise, perturbation, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Online Learning of Non-stationary Sequences

Monteleoni, Claire, Jaakkola, Tommi S.

Neural Information Processing SystemsDec-31-2004

We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. We derive upper and lower relative loss bounds for a class of universal learning algorithms involving a switching dynamics over the choice of the experts. On the basis of the performance bounds we provide the optimal a priori discretization for learning the parameter that governs the switching dynamics. We demonstrate the new algorithm in the context of wireless networks.

algorithm, discretization, sequence, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications > Networks (0.89)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)

Add feedback

Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Solan, Zach, Horn, David, Ruppin, Eytan, Edelman, Shimon

Neural Information Processing SystemsDec-31-2004

We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be "intermediate" for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children.

algorithm, corpus, equivalence class, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
(7 more...)

Genre: Research Report (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback

Link Prediction in Relational Data

Taskar, Ben, Wong, Ming-fai, Abbeel, Pieter, Koller, Daphne

Neural Information Processing SystemsDec-31-2004

Many real-world domains are relational in nature, consisting of a set of objects related to each other in complex ways. This paper focuses on predicting the existence and the type of links between entities in such domains. We apply the relational Markov network framework of Taskar et al. to define a joint probabilistic model over the entire link graph -- entity attributes and links. The application of the RMN algorithm to this task requires the definition of probabilistic patterns over subgraph structures. We apply this method to two new relational datasets, one involving university webpages, and the other a social network. We show that the collective classification approach of RMNs, and the introduction of subgraph patterns over link labels, provide significant improvements in accuracy over flat classification, which attempts to predict each link in isolation.

clique, relation, student, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry: Education (0.93)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Online Learning via Global Feedback for Phrase Recognition

Carreras, Xavier, Màrquez, Lluís

Neural Information Processing SystemsDec-31-2004

This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently.

classifier, phrase candidate, score function, (15 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Industry: Education > Educational Setting > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.95)

Add feedback

Online Classification on a Budget

Crammer, Koby, Kandola, Jaz, Singer, Yoram

Neural Information Processing SystemsDec-31-2004

Online algorithms for classification often require vast amounts of memory and computation time when employed in conjunction with kernel functions. In this paper we describe and analyze a simple approach for an on-the-fly reduction of the number of past examples used for prediction. Experiments performed with real datasets show that using the proposed algorithmic approach with a single epoch is competitive with the support vector machine (SVM) although the latter, being a batch algorithm, accesses each training example multiple times.

algorithm, online algorithm, support pattern, (15 more...)

Neural Information Processing Systems

Country: