Goto

Collaborating Authors

 semi-supervised learning algorithm


Data driven semi-supervised learning

Neural Information Processing Systems

We consider a novel data driven approach for designing semi-supervised learning algorithms that can effectively learn with only a small number of labeled examples. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past two decades, several elegant graph-based semi-supervised learning algorithms for inferring the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to heuristics and domain-specific art, and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations. We show how to leverage problem instances coming from an underlying problem domain to learn the graph hyperparameters for commonly used parametric families of graphs that provably perform well on new instances from the same domain. We obtain low regret and efficient algorithms in the online setting, and generalization guarantees in the distributional setting. We also show how to combine several very different similarity metrics and learn multiple hyperparameters, our results hold for large classes of problems. We expect some of the tools and techniques we develop along the way to be of independent interest, for data driven algorithms more generally.



Data driven semi-supervised learning

Neural Information Processing Systems

We consider a novel data driven approach for designing semi-supervised learning algorithms that can effectively learn with only a small number of labeled examples. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past two decades, several elegant graph-based semi-supervised learning algorithms for inferring the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to heuristics and domain-specific art, and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations.


Enhancing Deep Learning Model Robustness through Metamorphic Re-Training

arXiv.org Artificial Intelligence

This paper evaluates the use of metamorphic relations to enhance the robustness and real-world performance of machine learning models. We propose a Metamorphic Retraining Framework, which applies metamorphic relations to data and utilizes semi-supervised learning algorithms in an iterative and adaptive multi-cycle process. The framework integrates multiple semi-supervised retraining algorithms, including FixMatch, FlexMatch, MixMatch, and FullMatch, to automate the retraining, evaluation, and testing of models with specified configurations. To assess the effectiveness of this approach, we conducted experiments on CIFAR-10, CIFAR-100, and MNIST datasets using a variety of image processing models, both pretrained and non-pretrained. Our results demonstrate the potential of metamorphic retraining to significantly improve model robustness as we show in our results that each model witnessed an increase of an additional flat 17 percent on average in our robustness metric.


Data driven semi-supervised learning

Neural Information Processing Systems

We consider a novel data driven approach for designing semi-supervised learning algorithms that can effectively learn with only a small number of labeled examples. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past two decades, several elegant graph-based semi-supervised learning algorithms for inferring the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to heuristics and domain-specific art, and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations.


Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms

Neural Information Processing Systems

Semi-supervised learning algorithms have been successfully applied in many applications with scarce labeled data, by utilizing the unlabeled data. One important category is graph based semi-supervised learning algorithms, for which the performance depends considerably on the quality of the graph, or its hyperparameters. In this paper, we deal with the less explored problem of learning the graphs. We propose a graph learning method for the harmonic energy minimization method; this is done by minimizing the leave-one-out prediction error on labeled data points. We use a gradient based method and designed an efficient algorithm which significantly accelerates the calculation of the gradient by applying the matrix inversion lemma and using careful pre-computation.


Semi-Supervised Machine Learning: a Homological Approach

arXiv.org Artificial Intelligence

Using techniques of Symbolic Computation and Computer Algebra, we apply the concept of persistent homology to obtain a new semi-supervised learning method. Machine Learning and Deep Learning methods have become the state-of-the-art approach for solving data classification tasks. In order to use those methods, it is necessary to acquire and label a considerable amount of data; however, this is not straightforward in some fields, since data annotation is time consuming and may require expert knowledge. This challenge can be tackled by means of semi-supervised learning methods that take advantage of both labelled and unlabelled data. In our team we have applied this Machine Learning paradigm in various applied projects (e.g.


Why the pseudo label based semi-supervised learning algorithm is effective?

arXiv.org Artificial Intelligence

Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo labels on the unlabeled data, and then train a model to fit the previously generated pseudo labels. We give a theory analysis for why pseudo label based semi-supervised learning is effective in this paper. We mainly compare the generalization error of the model trained under two settings: (1) There are N labeled data. (2) There are N unlabeled data and a suitable initial model. Our analysis shows that, firstly, when the amount of unlabeled data tends to infinity, the pseudo label based semi-supervised learning algorithm can obtain model which have the same generalization error upper bound as model obtained by normally training in the condition of the amount of labeled data tends to infinity. More importantly, we prove that when the amount of unlabeled data is large enough, the generalization error upper bound of the model obtained by pseudo label based semi-supervised learning algorithm can converge to the optimal upper bound with linear convergence rate. We also give the lower bound on sampling complexity to achieve linear convergence rate. Our analysis contributes to understanding the empirical successes of pseudo label-based semi-supervised learning.


Semi-Supervised Learning

#artificialintelligence

Machine learning is one of the fastest-growing fields in the current technological landscape and it refers to a branch of artificial intelligence that deals with the prediction of outcomes by imitating the way that humans learn and perceive things. When talking about the classification of machine learning, primarily two categories come to mind namely Supervised Learning and Unsupervised Learning. In the following article, I'll be discussing a median path between the two called Semi-Supervised Learning. The primary difference between supervised and unsupervised learning is the type of data used by the algorithms. Supervised learning works on labelled data while unsupervised learning works on grouping or classifying the data based on similarities or differences rather than labels.


Multinomial Naїve Bayes' For Documents Classification and Natural Language Processing (NLP)

#artificialintelligence

It's formulated as several methods, widely used as an alternative to the distance-based K-Means clustering and decision tree forests, and deals with probability as the "likelihood" that data belongs to a specific class. The Gaussian and Multinomial models of the naïve Bayes exist. The multinomial model provides an ability to classify data, that cannot be represented numerically. Its main advantage is the significantly reduced complexity. It provides an ability to perform the classification, using small training sets, not requiring to be continuously re-trained.