In this blog we are going to see how Expectation-maximization algorithm works very closely. This blog is in strict continuation of the previous blog. Previously we saw how probabilistic clustering ended up into chicken-egg problem. That is, if we have distribution of latent variable, we can compute the parameters of the clusters and vice-versa. To understand how the entire approach works we need to learn few mathematical tools, namely: Jensen's inequality and KL divergence.