Goto

Collaborating Authors

 Bayesian Learning


A Tale of Three Probabilistic Families: Discriminative, Descriptive and Generative Models

arXiv.org Machine Learning

The pattern theory of Grenander is a mathematical framework where the patterns are represented by probability models on random variables of algebraic structures. In this paper, we review three families of probability models, namely, the discriminative models, the descriptive models, and the generative models. A discriminative model is in the form of a classifier. It specifies the conditional probability of the class label given the input signal. The descriptive model specifies the probability distribution of the signal, based on an energy function defined on the signal. A generative model assumes that the signal is generated by some latent variables via a transformation. We shall review these models within a common framework and explore their connections. We shall also review the recent developments that take advantage of the high approximation capacities of deep neural networks.


Deep clustering: On the link between discriminative models and K-means

arXiv.org Machine Learning

In the context of recent deep clustering studies, discriminative models dominate the literature and report the most competitive performances. These models learn a deep discriminative neural network classifier in which the labels are latent. Typically, they use multinomial logistic regression posteriors and parameter regularization, as is very common in supervised learning. It is generally acknowledged that discriminative objective functions (e.g., those based on the mutual information or the KL divergence) are more flexible than generative approaches (e.g., K-means) in the sense that they make fewer assumptions about the data distributions and, typically, yield much better unsupervised deep learning results. On the surface, several recent discriminative models may seem unrelated to K-means. This study shows that these models are, in fact, equivalent to K-means under mild conditions and common posterior models and parameter regularization. We prove that, for the commonly used logistic regression posteriors, maximizing the $L_2$ regularized mutual information via an approximate alternating direction method (ADM) is equivalent to a soft and regularized K-means loss. Our theoretical analysis not only connects directly several recent state-of-the-art discriminative models to K-means, but also leads to a new soft and regularized deep K-means algorithm, which yields competitive performance on several image clustering benchmarks.


Unifying the Dropout Family Through Structured Shrinkage Priors

arXiv.org Machine Learning

Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network's weights. We derive the equivalence through reparametrization properties of scale mixtures and not via any approximation. Given the equivalence, we then show that dropout's usual Monte Carlo training objective approximates marginal MAP estimation. We analyze this MAP objective under strong shrinkage, showing the expanded parametrization (i.e. likelihood noise) is more stable than a hierarchical representation. Lastly, we derive analogous priors for ResNets, RNNs, and CNNs and reveal their equivalent implementation as noise.


A unified theory of adaptive stochastic gradient descent as Bayesian filtering

arXiv.org Machine Learning

We formulate stochastic gradient descent (SGD) as a Bayesian filtering problem. Inference in the Bayesian setting naturally gives rise to BRMSprop and BAdam: Bayesian variants of RMSprop and Adam. Remarkably, the Bayesian approach recovers many features of state-of-the-art adaptive SGD methods, including amoungst others root-mean-square normalization, Nesterov acceleration and AdamW. As such, the Bayesian approach provides one explanation for the empirical effectiveness of state-of-the-art adaptive SGD algorithms. Empirically comparing BRMSprop and BAdam with naive RMSprop and Adam on MNIST, we find that Bayesian methods have the potential to considerably reduce test loss and classification error.


Cooperative Starting Movement Detection of Cyclists Using Convolutional Neural Networks and a Boosted Stacking Ensemble

arXiv.org Artificial Intelligence

In future, vehicles and other traffic participants will be interconnected and equipped with various types of sensors, allowing for cooperation on different levels, such as situation prediction or intention detection. In this article we present a cooperative approach for starting movement detection of cyclists using a boosted stacking ensemble approach realizing feature- and decision level cooperation. We introduce a novel method based on a 3D Convolutional Neural Network (CNN) to detect starting motions on image sequences by learning spatio-temporal features. The CNN is complemented by a smart device based starting movement detection originating from smart devices carried by the cyclist. Both model outputs are combined in a stacking ensemble approach using an extreme gradient boosting classifier resulting in a fast and yet robust cooperative starting movement detector. We evaluate our cooperative approach on real-world data originating from experiments with 49 test subjects consisting of 84 starting motions.


Meta-Learning: A Survey

arXiv.org Machine Learning

Meta-learning, or learning to learn, is the science of systematically observing how different machine learning approaches perform on a wide range of learning tasks, and then learning from this experience, or meta-data, to learn new tasks much faster than otherwise possible. Not only does this dramatically speed up and improve the design of machine learning pipelines or neural architectures, it also allows us to replace hand-engineered algorithms with novel approaches learned in a data-driven way. In this chapter, we provide an overview of the state of the art in this fascinating and continuously evolving field.


Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective

arXiv.org Machine Learning

We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consists of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP---which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a Bayesian state estimation problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers, which were formulated in terms of generating synthetic measurements of the vector field, come out as specific approximations. We derive novel solvers, both Gaussian and non-Gaussian, from the Bayesian state estimation problem posed in this paper and compare them with other probabilistic solvers in illustrative experiments.


An easy-to-use empirical likelihood ABC method

arXiv.org Machine Learning

Many scientifically well-motivated statistical models in natural, engineering and environmental sciences are specified through a generative process, but in some cases it may not be possible to write down a likelihood for these models analytically. Approximate Bayesian computation (ABC) methods, which allow Bayesian inference in these situations, are typically computationally intensive. Recently, computationally attractive empirical likelihood based ABC methods have been suggested in the literature. These methods heavily rely on the availability of a set of suitable analytically tractable estimating equations. We propose an easy-to-use empirical likelihood ABC method, where the only inputs required are a choice of summary statistic, it's observed value, and the ability to simulate summary statistics for any parameter value under the model. It is shown that the posterior obtained using the proposed method is consistent, and its performance is explored using various examples.


SALSA-TEXT : self attentive latent space based adversarial text generation

arXiv.org Artificial Intelligence

Inspired by the success of self attention mechanism and Transformer architecture in sequence transduction and image generation applications, we propose novel self attention-based architectures to improve the performance of adversarial latent codebased schemes in text generation. Adversarial latent code-based text generation has recently gained a lot of attention due to its promising results. In this paper, we take a step to fortify the architectures used in these setups, specifically AAE and ARAE. We benchmark two latent code-based methods (AAE and ARAE) designed based on adversarial setups. In our experiments, the Google sentence compression dataset is utilized to compare our method with these methods using various objective and subjective measures. The experiments demonstrate the proposed (self) attention-based models outperform the state-of-the-art in adversarial code-based text generation. Text generation is of particular interest in many natural language processing (NLP) applications such as dialogue systems, machine translation, image captioning and text summarization. Recent deep learning-based approaches to this problem can be categorized into three classes: auto-regressive or maximum likelihood estimation (MLE)-based, generative adversarial network (GAN)-based and reinforcement learning (RL)-based approaches. RNNs compactly represent the samples history in the form of recurrent states.


Top Machine Learning Algorithms You Should Know to Become a Data Scientist - DZone AI

#artificialintelligence

There are two ways to categorize Machine Learning algorithms you may come across in the field. Generally, both approaches are useful. However, we will focus in on the grouping of algorithms by similarity and go on a tour of a variety of different algorithm types. There are different ways an algorithm can model a problem as it relates to the interaction with the experience. However, it doesn't matter whatever we want to call the input data. Also, an algorithm is popular in Machine Learning and Artificial Intelligence textbooks. That is to first consider the learning styles that an algorithm can adapt. Generally, there are only a few main learning styles that a Machine Learning algorithm can have.