When Bayes, Ockham, and Shannon come together to define machine learning
It is somewhat surprising that among all the high-flying buzzwords of machine learning, we don't hear much about the one phrase which fuses some of the core concepts of statistical learning, information theory, and natural philosophy into a single three-word-combo. Moreover, it is not just an obscure and pedantic phrase meant for machine learning (ML) Ph.Ds and theoreticians. It has a precise and easily accessible meaning for anyone interested to explore, and a practical pay-off for the practitioners of ML and data science. I am talking about Minimum Description Length. Let's peel the layers off and see how useful it is… We start with (not chronologically) with Reverend Thomas Bayes, who by the way, never published his idea about how to do statistical inference, but was later immortalized by the eponymous theorem. It was the second half of the 18th century, and there was no branch of mathematical sciences called "Probability Theory".
Oct-24-2018, 20:02:43 GMT