Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


5 Main Types of Machine Learning Systems

#artificialintelligence

Supervised learning is the common most type of machine learning. Most ML problems that we encounter fall into this category. As the name implies, a supervised learning algorithm is trained with input data along with some form of guidance that we can call labels. Labels are also known as targets and they act as a description of the input data. With that said, there are other advanced tasks that don't directly fall into supervised learning, but they actually are.


Supervised Learning vs Unsupervised Learning

#artificialintelligence

Supervised learning involves learning a function that maps an input to an output based on example input-output pairs. Unlike supervised learning, unsupervised learning is used to draw inferences and find patterns from input data without references to labeled outcomes. In classification models, the output is discrete. Unlike supervised learning, unsupervised learning is used to draw inferences and find patterns from input data without references to labeled outcomes. Clustering is an unsupervised technique that involves the grouping, or clustering, of data points.


International Workshop on Continual Semi-Supervised Learning: Introduction, Benchmarks and Baselines

arXiv.org Artificial Intelligence

The aim of this paper is to formalize a new continual semi-supervised learning (CSSL) paradigm, proposed to the attention of the machine learning community via the IJCAI 2021 International Workshop on Continual Semi-Supervised Learning (CSSL-IJCAI), with the aim of raising field awareness about this problem and mobilizing its effort in this direction. After a formal definition of continual semi-supervised learning and the appropriate training and testing protocols, the paper introduces two new benchmarks specifically designed to assess CSSL on two important computer vision tasks: activity recognition and crowd counting. We describe the Continual Activity Recognition (CAR) and Continual Crowd Counting (CCC) challenges built upon those benchmarks, the baseline models proposed for the challenges, and describe a simple CSSL baseline which consists in applying batch self-training in temporal sessions, for a limited number of rounds. The results show that learning from unlabelled data streams is extremely challenging, and stimulate the search for methods that can encode the dynamics of the data stream.


Robust Semi-Supervised Classification using GANs with Self-Organizing Maps

arXiv.org Artificial Intelligence

Generative adversarial networks (GANs) have shown tremendous promise in learning to generate data and effective at aiding semi-supervised classification. However, to this point, semi-supervised GAN methods make the assumption that the unlabeled data set contains only samples of the joint distribution of the classes of interest, referred to as inliers. Consequently, when presented with a sample from other distributions, referred to as outliers, GANs perform poorly at determining that it is not qualified to make a decision on the sample. The problem of discriminating outliers from inliers while maintaining classification accuracy is referred to here as the DOIC problem. In this work, we describe an architecture that combines self-organizing maps (SOMs) with SS-GANS with the goal of mitigating the DOIC problem and experimental results indicating that the architecture achieves the goal. Multiple experiments were conducted on hyperspectral image data sets. The SS-GANS performed slightly better than supervised GANS on classification problems with and without the SOM. Incorporating the SOMs into the SS-GANs and the supervised GANS led to substantially mitigation of the DOIC problem when compared to SS-GANS and GANs without the SOMs. Furthermore, the SS-GANS performed much better than GANS on the DOIC problem, even without the SOMs.


HyperSeed: Unsupervised Learning with Vector Symbolic Architectures

arXiv.org Artificial Intelligence

Across all experiments, Hyperseed convincingly machine learning and robotics context is currently gaining a demonstrates its key novelties of learning from a few input great momentum [1]-[6]. In classification tasks, the use of vectors and single vector operation learning rule, both of which VSA leads to order of magnitude increase in energy efficiency contribute towards reduced time and computation complexity. of computations on the one hand and natively enables oneshot The paper is structured as follows. Section II describes and multitask learning on the other [7]. It is prospected the related work relevant to Hyperseed operations. The used that VSA will play a key role in the development of novel methods including the fundamentals of VSA are presented neuromorphic computer architectures [8] as an algorithmic in Section III. Section IV presents the main contribution - abstraction [9], [10]. The main contribution of this paper is the method for unsupervised learning Hyperseed. Section V a novel algorithm for unsupervised learning called Hyperseed, reports the results of the performance evaluation the experiments.


Model-Change Active Learning in Graph-Based Semi-Supervised Learning

arXiv.org Machine Learning

Active learning in semi-supervised classification involves introducing additional labels for unlabelled data to improve the accuracy of the underlying classifier. A challenge is to identify which points to label to best improve performance while limiting the number of new labels. "Model-change" active learning quantifies the resulting change incurred in the classifier by introducing the additional label(s). We pair this idea with graph-based semi-supervised learning methods, that use the spectrum of the graph Laplacian matrix, which can be truncated to avoid prohibitively large computational and storage costs. We consider a family of convex loss functions for which the acquisition function can be efficiently approximated using the Laplace approximation of the posterior distribution. We show a variety of multiclass examples that illustrate improved performance over prior state-of-art.


Life is not black and white -- Combining Semi-Supervised Learning with fuzzy labels

arXiv.org Artificial Intelligence

The required amount of labeled data is one of the biggest issues in deep learning. Semi-Supervised Learning can potentially solve this issue by using additional unlabeled data. However, many datasets suffer from variability in the annotations. The aggregated labels from these annotation are not consistent between different annotators and thus are considered fuzzy. These fuzzy labels are often not considered by Semi-Supervised Learning. This leads either to an inferior performance or to higher initial annotation costs in the complete machine learning development cycle. We envision the incorporation of fuzzy labels into Semi-Supervised Learning and give a proof-of-concept of the potential lower costs and higher consistency in the complete development cycle. As part of our concept, we discuss current limitations, futures research opportunities and potential broad impacts.


The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

arXiv.org Machine Learning

Semi-supervised learning (SSL) has demonstrated its potential to improve the model accuracy for a variety of learning tasks when the high-quality supervised data is severely limited. Although it is often established that the average accuracy for the entire population of data is improved, it is unclear how SSL fares with different sub-populations. Understanding the above question has substantial fairness implications when these different sub-populations are defined by the demographic groups we aim to treat fairly. In this paper, we reveal the disparate impacts of deploying SSL: the sub-population who has a higher baseline accuracy without using SSL (the ``rich" sub-population) tends to benefit more from SSL; while the sub-population who suffers from a low baseline accuracy (the ``poor" sub-population) might even observe a performance drop after adding the SSL module. We theoretically and empirically establish the above observation for a broad family of SSL algorithms, which either explicitly or implicitly use an auxiliary ``pseudo-label". Our experiments on a set of image and text classification tasks confirm our claims. We discuss how this disparate impact can be mitigated and hope that our paper will alarm the potential pitfall of using SSL and encourage a multifaceted evaluation of future SSL algorithms. Code is available at github.com/UCSC-REAL/Disparate-SSL.


Field Extraction from Forms with Unlabeled Data

arXiv.org Artificial Intelligence

We propose a novel framework to conduct field extraction from forms with unlabeled data. To bootstrap the training process, we develop a rule-based method for mining noisy pseudo-labels from unlabeled forms. Using the supervisory signal from the pseudo-labels, we extract a discriminative token representation from a transformer-based model by modeling the interaction between text in the form. To prevent the model from overfitting to label noise, we introduce a refinement module based on a progressive pseudo-label ensemble. Experimental results demonstrate the effectiveness of our framework.


Generative Adversal Networks in Machine Learning

#artificialintelligence

GANs is one of the helpful techniques from Machine Learning related to photo editing. A Generative Adversarial Network also known as -- GAN is a group of Machine Learning. It was designed by Ian Goodfellow and his colleagues in 2014. Initially, they were put forward as a generative model for unsupervised learning but they are being extremely useful for semisupervised learning, supervised learning, and also for reinforcement learning. They are created with the help of two neural networks that compete with each other and have the ability to create new output by analyzing, capturing, and copying the variation from the given datasets.