Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


Semi-supervised Learning based on Distributionally Robust Optimization

arXiv.org Machine Learning

We propose a novel method for semi-supervised learning (SSL) based on data-driven distributionally robust optimization (DRO) using optimal transport metrics. Our proposed method enhances generalization error by using the unlabeled data to restrict the support of the worst case distribution in our DRO formulation. We enable the implementation of our DRO formulation by proposing a stochastic gradient descent algorithm which allows to easily implement the training procedure. We demonstrate that our Semi-supervised DRO method is able to improve the generalization error over natural supervised procedures and state-of-the-art SSL estimators. Finally, we include a discussion on the large sample behavior of the optimal uncertainty region in the DRO formulation. Our discussion exposes important aspects such as the role of dimension reduction in SSL.


Learning Robust Representations for Computer Vision

arXiv.org Machine Learning

Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.


Machine Learning Algorithms - Giuseppe Bonaccorso

#artificialintelligence

My latest machine learning book has been published and will be available during the last week of July. In this book you will learn all the important Machine Learning algorithms that are commonly used in the field of data science. These algorithms can be used for supervised as well as unsupervised learning, reinforcement learning, and semi-supervised learning. A few famous algorithms that are covered in this book are Linear regression, Logistic Regression, SVM, Naรฏve Bayes, K-Means, Random Forest, and Feature engineering. In this book you will also learn how these algorithms work and their practical implementation to resolve your problems.


Machine Learning: An Introduction to Supervised and Unsupervised Learning Algorithms

#artificialintelligence

The phrase "Machine Learning" refers to the automatic detection of meaningful data by computing systems. In the last few decades, it has become a common tool in almost any task that needs to understand data from large data sets. One of the biggest application of machine learning technology is the search engine. Search engines learn how to provide the best results based on historic, trending, and relative data sets. When you look at anti-spam software, it learns how to filter email messages.



On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL

arXiv.org Machine Learning

In various approaches to learning, notably in domain adaptation, active learning, learning under covariate shift, semi-supervised learning, learning with concept drift, and the like, one often wants to compare a baseline classifier to one or more advanced (or at least different) strategies. In this chapter, we basically argue that if such classifiers, in their respective training phases, optimize a so-called surrogate loss that it may also be valuable to compare the behavior of this loss on the test set, next to the regular classification error rate. It can provide us with an additional view on the classifiers' relative performances that error rates cannot capture. As an example, limited but convincing empirical results demonstrates that we may be able to find semi-supervised learning strategies that can guarantee performance improvements with increasing numbers of unlabeled data in terms of log-likelihood. In contrast, the latter may be impossible to guarantee for the classification error rate.


Unsupervised Learning via Total Correlation Explanation

arXiv.org Machine Learning

Learning by children and animals occurs effortlessly and largely without obvious supervision. Successes in automating supervised learning have not translated to the more ambiguous realm of unsupervised learning where goals and labels are not provided. Barlow (1961) suggested that the signal that brains leverage for unsupervised learning is dependence, or redundancy, in the sensory environment. Dependence can be characterized using the information-theoretic multivariate mutual information measure called total correlation. The principle of Total Cor-relation Ex-planation (CorEx) is to learn representations of data that "explain" as much dependence in the data as possible. We review some manifestations of this principle along with successes in unsupervised learning problems across diverse domains including human behavior, biology, and language.


Effects of Additional Data on Bayesian Clustering

arXiv.org Machine Learning

Hierarchical probabilistic models, such as mixture models, are used for cluster analysis. These models have two types of variables: observable and latent. In cluster analysis, the latent variable is estimated, and it is expected that additional information will improve the accuracy of the estimation of the latent variable. Many proposed learning methods are able to use additional data; these include semi-supervised learning and transfer learning. However, from a statistical point of view, a complex probabilistic model that encompasses both the initial and additional data might be less accurate due to having a higher-dimensional parameter. The present paper presents a theoretical analysis of the accuracy of such a model and clarifies which factor has the greatest effect on its accuracy, the advantages of obtaining additional data, and the disadvantages of increasing the complexity.


Redefining Basketball Positions with Unsupervised Learning

#artificialintelligence

The NBA Finals are over. The last of the champagne bottles have been emptied and the confetti has begun to settle. Now that the Golden State Warriors have finished unleashing their otherworldly dominance on the basketball world, I thought it would be a good time to wrap up a hardwood-focused machine learning project. The Warriors are prime exhibitors of a new trend in the sport of basketball, a trend that advocates pass-first, ballet-beautiful movement over dominance through individual greatness. As such, traditional positions like'point guard' and'center' really don't seem to apply to their players anymore.


Which machine learning algorithm should I use? 7wData

#artificialintelligence

This resource is designed primarily for beginning data scientists or analysts who are interested in identifying and applying machine learning algorithms to address the problems of their interest. A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is "which algorithm should I use?" Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms. We are not advocating a one and done approach, but we do hope to provide some guidance on which algorithms to try first depending on some clear factors. The machine learning algorithm cheat sheet helps you to choose from a variety of machine learning algorithms to find the appropriate algorithm for your specific problems.