AITopics | Unsupervised or Indirectly Supervised Learning

Collaborating Authors

Unsupervised or Indirectly Supervised Learning

Unsupervised learning is a branch of machine learning that learns from test data that has not been labeled, classified or categorized. Instead of responding to feedback, unsupervised learning identifies commonalities in the data and reacts based on the presence or absence of such commonalities in each new piece of data. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Exploiting Unlabeled Data to Enhance Ensemble Diversity

Zhang, Min-Ling, Zhou, Zhi-Hua

arXiv.org Artificial IntelligenceSep-24-2010

Ensemble learning aims to improve generalization ability by using multiple base learners. It is well-known that to construct a good ensemble, the base learners should be accurate as well as diverse. In this paper, unlabeled data is exploited to facilitate ensemble learning by helping augment the diversity among the base learners. Specifically, a semi-supervised ensemble method named UDEED is proposed. Unlike existing semi-supervised ensemble methods where error-prone pseudo-labels are estimated for unlabeled data to enlarge the labeled data to improve accuracy, UDEED works by maximizing accuracies of base learners on labeled data while maximizing diversity among them on unlabeled data. Experiments show that UDEED can effectively utilize unlabeled data for ensemble learning and is highly competitive to well-established semi-supervised ensemble methods.

artificial intelligence, diversity, machine learning, (14 more...)

arXiv.org Artificial Intelligence

0909.3593

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
(7 more...)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)

Add feedback

Learning from Concept Drifting Data Streams with Unlabeled Data

Li, Peipei (Hefei University of Technology) | Wu, Xindong (University of Vermont) | Hu, Xuegang (Hefei University of Technology)

AAAI ConferencesJul-15-2010

Contrary to the previous beliefs that all arrived streaming data are labeled and the class labels are immediately availa- ble, we propose a Semi-supervised classification algorithm for data streams with concept drifts and UNlabeled data, called SUN. SUN is based on an evolved decision tree. In terms of deviation between history concept clusters and new ones generated by a developed clustering algorithm of k-Modes, concept drifts are distinguished from noise at leaves. Extensive studies on both synthetic and real data demonstrate that SUN performs well compared to several known online algorithms on unlabeled data. A conclusion is hence drawn that a feasible reference framework is provided for tackling concept drifting data streams with unlabeled data.

algorithm, artificial intelligence, machine learning, (13 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Vermont > Chittenden County > Burlington (0.15)
Asia > China > Anhui Province > Hefei (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.37)

Add feedback

Transductive Learning on Adaptive Graphs

Zhang, Yan-Ming (Chinese Academy of Sciences) | Zhang, Yu (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology) | Liu, Cheng-Lin (Chinese Academy of Sciences) | Hou, Xinwen (Chinese Academy of Sciences)

AAAI ConferencesJul-15-2010

Graph-based semi-supervised learning methods are based on some smoothness assumption about the data. As a discrete approximation of the data manifold, the graph plays a crucial role in the success of such graph-based methods. In most existing methods, graph construction makes use of a predefined weighting function without utilizing label information even when it is available. In this work, by incorporating label information, we seek to enhance the performance of graph-based semi-supervised learning by learning the graph and label inference simultaneously. In particular, we consider a particular setting of semi-supervised learning called transductive learning. Using the LogDet divergence to define the objective function, we propose an iterative algorithm to solve the optimization problem which has closed-form solution in each step. We perform experiments on both synthetic and real data to demonstrate improvement in the graph and in terms of classification accuracy.

artificial intelligence, inductive learning, machine learning, (16 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(4 more...)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.78)

Add feedback

Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

Downey, Doug, Etzioni, Oren

Neural Information Processing SystemsDec-31-2009

Is accurate classification possible in the absence of hand-labeled data? This paper introduces the Monotonic Feature (MF) abstraction--where the probability of class membership increases monotonically with the MF's value. The paper proves that when an MF is given, PAC learning is possible with no hand-labeled data under certain assumptions. We argue that MFs arise naturally in a broad range of textual classification applications. On the classic "20 Newsgroups" data set, a learner given an MF and unlabeled data achieves classification accuracy equal to that of a state-of-the-art semi-supervised learner relying on 160 hand-labeled examples. Even when MFs are not given as input, their presence or absence can be determined from a small amount of hand-labeled data, which yields a new semi-supervised learning method that reduces error by 15% on the 20 Newsgroups data.

assumption, classifier, monotonic feature, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
(4 more...)

Add feedback

Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

Downey, Doug, Etzioni, Oren

Neural Information Processing SystemsDec-31-2009

assumption, classifier, monotonic feature, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
(4 more...)

Add feedback

Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

Nadler, Boaz, Srebro, Nathan, Zhou, Xueyuan

Neural Information Processing SystemsDec-31-2009

We study the behavior of the popular Laplacian Regularization method for Semi-Supervised Learning at the regime of a fixed number of labeled points but a large number of unlabeled points. We show that in $\R^d$, $d \geq 2$, the method is actually not well-posed, and as the number of unlabeled points increases the solution degenerates to a noninformative function. We also contrast the method with the Laplacian Eigenvector method, and discuss the ``smoothness assumptions associated with this alternate method.

semi-supervised learning, threshold, unlabeled point, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)

Add feedback

Localized Sliced Inverse Regression

Wu, Qiang, Mukherjee, Sayan, Liang, Feng

Neural Information Processing SystemsDec-31-2009

We developed localized sliced inverse regression for supervised dimension reduction. It has the advantages of preventing degeneracy, increasing estimation accuracy, and automatic subclass discovery in classification problems. A semisupervised version is proposed for the use of unlabeled data. The utility is illustrated on simulated as well as real data sets.

artificial intelligence, dimension reduction, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.95)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)

Add feedback

Unlabeled data: Now it helps, now it doesn't

Singh, Aarti, Nowak, Robert, Zhu, Jerry

Neural Information Processing SystemsDec-31-2009

Empirical evidence shows that in favorable situations semi-supervised learning (SSL) algorithms can capitalize on the abundancy of unlabeled training data to improve the performance of a learning task, in the sense that fewer labeled training data are needed to achieve a target error bound. However, in other situations unlabeled data do not seem to help. Recent attempts at theoretically characterizing the situations in which unlabeled data can help have met with little success, and sometimes appear to conflict with each other and intuition. In this paper, we attempt to bridge the gap between practice and theory of semi-supervised learning. We develop a rigorous framework for analyzing the situations in which unlabeled data can help and quantify the improvement possible using finite sample error bounds. We show that there are large classes of problems for which SSL can significantly outperform supervised learning, in finite sample regimes and sometimes also in terms of error convergence rates.

artificial intelligence, machine learning, unlabeled data, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)

Add feedback

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Zhang, Yi, Dubrawski, Artur, Schneider, Jeff G.

Neural Information Processing SystemsDec-31-2009

In this paper, we address the question of what kind of knowledge is generally transferable from unlabeled text. We suggest and analyze the semantic correlation of words as a generally transferable structure of the language and propose a new method to learn this structure using an appropriately chosen latent variable model. This semantic correlation contains structural information of the language space and can be used to control the joint shrinkage of model parameters for any specific task in the same space through regularization. In an empirical study, we construct 190 different text classification tasks from a real-world benchmark, and the unlabeled documents are a mixture from all these tasks. We test the ability of various algorithms to use the mixed unlabeled text to enhance all classification tasks. Empirical results show that the proposed approach is a reliable and scalable method for semi-supervised learning, regardless of the source of unlabeled data, the specific task to be enhanced, and the prediction model used.

correlation, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.61)
(3 more...)

Add feedback

Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

Yang, Liu, Jin, Rong, Sukthankar, Rahul

Neural Information Processing SystemsDec-31-2009

The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classification accuracy becomes a challenge. We introduce Semi-supervised Learning with Weakly-Related Unlabeled Data" (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classification tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal word-correlation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of state-of-the-art methods for inductive semi-supervised learning and text categorization; and we show that SSLW results in a significant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test beds."

artificial intelligence, machine learning, unlabeled data, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.31)

Add feedback