normal distribution



The Ising distribution as a latent variable model

arXiv.org Machine Learning

We show that the Ising distribution can be treated as a latent variable model, where a set of N real-valued, correlated random variables are drawn and used to generate N binary spins independently. This allows to approximate the Ising distribution by a simpler model where the latent variables follow a multivariate normal distribution. The resulting approximation bears similarities with the Thouless Anderson Palmer (TAP) solution from mean field theory, but retains a broader range of applicability when the coupling weights are not independently distributed. Moreover, unlike classic mean field approaches, the approximation can be used to generate correlated spin patterns.


40 Interview Questions asked at Startups in Machine Learning / Data Science

@machinelearnbot

This article was posted by Manish Saraswat on Analytics Vidhya. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy. Machine learning and data science are being looked as the drivers of the next industrial revolution happening in the world today. This also means that there are numerous exciting startups looking for data scientists.


How to Start Training: The Effect of Initialization and Architecture

arXiv.org Machine Learning

We investigate the effects of initialization and architecture on the start of training in deep ReLU nets. We identify two common failure modes for early training in which the mean and variance of activations are poorly behaved. For each failure mode, we give a rigorous proof of when it occurs at initialization and how to avoid it. The first failure mode, exploding/vanishing mean activation length, can be avoided by initializing weights from a symmetric distribution with variance 2/fan-in. The second failure mode, exponentially large variance of activation length, can be avoided by keeping constant the sum of the reciprocals of layer widths. We demonstrate empirically the effectiveness of our theoretical results in predicting when networks are able to start training. In particular, we note that many popular initializations fail our criteria, whereas correct initialization and architecture allows much deeper networks to be trained.


Wasserstein Distance Measure Machines

arXiv.org Machine Learning

This paper presents a distance-based discriminative framework for learning with probability distributions. Instead of using kernel mean embeddings or generalized radial basis kernels, we introduce embeddings based on dissimilarity of distributions to some reference distributions denoted as templates. Our framework extends the theory of similarity of \citet{balcan2008theory} to the population distribution case and we prove that, for some learning problems, Wasserstein distance achieves low-error linear decision functions with high probability. Our key result is to prove that the theory also holds for empirical distributions. Algorithmically, the proposed approach is very simple as it consists in computing a mapping based on pairwise Wasserstein distances and then learning a linear decision function. Our experimental results show that this Wasserstein distance embedding performs better than kernel mean embeddings and computing Wasserstein distance is far more tractable than estimating pairwise Kullback-Leibler divergence of empirical distributions.


Machine Learning is all about Common Sense.

#artificialintelligence

Machines are invented to solve problems and make life easier. It is widely accepted that common sense is a sense which is not so common:) So, don't you think this problem should also be addressed?


A folded model for compositional data analysis

arXiv.org Machine Learning

A folded type model is developed for analyzing compositional data. The proposed model, which is based upon the $\alpha$-transformation for compositional data, provides a new and flexible class of distributions for modeling data defined on the simplex sample space. Despite its rather seemingly complex structure, employment of the EM algorithm guarantees efficient parameter estimation. The model is validated through simulation studies and examples which illustrate that the proposed model performs better in terms of capturing the data structure, when compared to the popular logistic normal distribution.



Removing Outliers Using Standard Deviation in Python

@machinelearnbot

Standard deviation is a metric of variance i.e. how much the individual data points are spread out from the mean. Both have the same mean 25. However, the first dataset has values closer to the mean and the second dataset has values more spread out. To be more precise, the standard deviation for the first dataset is 3.13 and for the second set is 14.67. However, it's not easy to wrap your head around numbers like 3.13 or 14.67.


Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction

arXiv.org Machine Learning

We introduce a new algorithm, called CDER, for supervised machine learning that merges the multi-scale geometric properties of Cover Trees with the information-theoretic properties of entropy. CDER applies to a training set of labeled pointclouds embedded in a common Euclidean space. If typical pointclouds corresponding to distinct labels tend to differ at any scale in any sub-region, CDER can identify these differences in (typically) linear time, creating a set of distributional coordinates which act as a feature extraction mechanism for supervised learning. We describe theoretical properties and implementation details of CDER, and illustrate its benefits on several synthetic examples.