If you are a machine learning practitioner working on generative modeling, Bayesian deep learning, or deep reinforcement learning, normalizing flows are a handy technique to have in your algorithmic toolkit. Normalizing flows transform simple densities (like Gaussians) into rich complex distributions that can be used for generative models, RL, and variational inference. TensorFlow has a nice set of functions that make it easy to build flows and train them to suit real-world data. This tutorial comes in two parts: Part 1: Distributions and Determinants. In this post, I explain how invertible transformations of densities can be used to implement more complex densities, and how these transformations can be chained together to form a "normalizing flow". Part 2: Modern Normalizing Flows: In a follow-up post, I survey recent techniques developed by researchers to learn normalizing flows, and explain how a slew of modern generative modeling techniques -- autoregressive models, MAF, IAF, NICE, Real-NVP, Parallel-Wavenet -- are all related to each other. This series is written for an audience with a rudimentary understanding of linear algebra, probability, neural networks, and TensorFlow. Knowledge of recent advances in Deep Learning, generative models will be helpful in understanding the motivations and context underlying these techniques, but they are not necessary.

If you are a machine learning practitioner working on generative modeling, Bayesian deep learning, or deep reinforcement learning, normalizing flows are a handy technique to have in your algorithmic toolkit. Normalizing flows transform simple densities (like Gaussians) into rich complex distributions that can be used for generative models, RL, and variational inference. TensorFlow has a nice set of functions that make it easy to build flows and train them to suit real-world data. This tutorial comes in two parts: Part 1: Distributions and Determinants. In this post, I explain how invertible transformations of densities can be used to implement more complex densities, and how these transformations can be chained together to form a "normalizing flow". Part 2: Modern Normalizing Flows: In a follow-up post, I survey recent techniques developed by researchers to learn normalizing flows, and explain how a slew of modern generative modeling techniques -- autoregressive models, MAF, IAF, NICE, Real-NVP, Parallel-Wavenet -- are all related to each other. This series is written for an audience with a rudimentary understanding of linear algebra, probability, neural networks, and TensorFlow. Knowledge of recent advances in Deep Learning, generative models will be helpful in understanding the motivations and context underlying these techniques, but they are not necessary.

Variational Auto-Encoders (VAEs) are powerful models for learning low-dimensional representations of your data. TensorFlow's distributions package provides an easy way to implement different kinds of VAEs. In this post, I will walk you through the steps for training a simple VAE on MNIST, focusing mainly on the implementation. Please take a look at Kevin Frans' post for a higher-level overview. A VAE consist of three components: an encoder, a prior, and a decoder .

Additionally, you could do a univariate analysis by studying a single variable at a time or multivariate analysis where you would study more than one variable at the same time to identify outliers. The x-axis, in the above plot, represents the Revenues and the y-axis, probability density of the observed Revenue value. The density curve for the actual data is shaded in'pink', the normal distribution is shaded in'green' and log normal distribution is shaded in'blue'. The probability density for the actual distribution is calculated from the observed data, whereas for both normal and log-normal distribution is computed based on the observed mean and standard deviation of the Revenues.

With the help of an effective feature engineering process, we intend to come up with an effective representation of the data. Entropy: Higher the entropy, more the information contained in the data, variance: higher the variance: more the information, projection for better separation: the projection to the basis which has the highest variance holds more information, feature to class association etc, all of these explains the information in data. However, sometimes we may find that the features are not following a normal distribution but a log normal distribution instead. One of the common things to do in this situation is to take the log of the feature values (that exhibit log normal distribution) so that it exhibits a normal distribution.If the algorithm being used is making the implicit/explicit assumption of the features being normally distributed, then such a transformation of a log-normally distributed feature to a normally distributed feature can help improve the performance of that algorithm.

Such values follow a normal distribution. According to the Wikipedia article on normal distribution, about 68% of values drawn from a normal distribution are within one standard deviation σ away from the mean; about 95% of the values lie within two standard deviations; and about 99.7% are within three standard deviations. As you case see, we removed the outlier values and if we plot this dataset, our plot will look much better. But in our case, the outliers were clearly because of error in the data and the data was in a normal distribution so standard deviation made sense.

The data set has missing values which spread along 1 standard deviation from the median. Therefore, 32% of the data would remain unaffected by missing values. In an imbalanced data set, accuracy should not be used as a measure of performance because 96% (as given) might only be predicting majority class correctly, but our class of interest is minority class (4%) which is the people who actually got diagnosed with cancer. Hence, in order to evaluate model performance, we should use Sensitivity (True Positive Rate), Specificity (True Negative Rate), F measure to determine class wise performance of the classifier.

It is good practice to gather a population of results when comparing two different machine learning algorithms or when comparing the same algorithm with different configurations. In this tutorial, you will discover how you can investigate and interpret machine learning experimental results using statistical significance tests in Python. How to Use Statistical Significance Tests to Interpret Machine Learning Results Photo by oatsy40, some rights reserved. In this tutorial, you discovered how you can use statistical significance tests to interpret machine learning results.

The question was: What is the Central Limit Theorem? Instead of surveying the whole population, you collect one sample of 100 beer drinkers in the US. The Central Limit Theorem is at the core of what every data scientist does daily: make statistical inferences about data. By knowing that our sample mean will fit somewhere in a normal distribution, we know that 68 percent of the observations lie within one standard deviation from the population mean, 95 percent will lie within two standard deviations and so on.