Deep Learning and Information Theory

#artificialintelligence 

If you have tried to understand the maths behind machine learning, including deep learning, you would have come across topics from Information Theory – Entropy, Cross Entropy, KL Divergence, etc. The concepts from information theory is ever prevalent in the realm of machine learning, right from the splitting criteria of a Decision Tree to loss functions in Generative Adversarial Networks. If you are a beginner in Machine Learning, you might not have made an effort to go deep and understand the mathematics behind the ".fit()", but as you mature and stumble across more and more complex problems, it becomes essential to understand the math or at least the intuition behind the maths to effectively apply the right technique at the right place. When I was starting out, I was also guilty of the same. I'll see "Cross Categorical Entropy" as a loss function in a Neural Network and I take it for granted – that it is some magical loss function that works with multi-class labels. I'll see "entropy" as one of the splitting criterion in Decision Trees and I just experiment with it without understanding what it is. But as I matured, I decided to spend more time in understanding the basics and it helped me immensely in getting my intuitions right. This also helped in understanding the different ways the popular Deep Learning Frameworks, PyTorch and Tensorflow, have implemented the different loss functions and decide when to use what. This blog is me summarising my understanding of the underlying concepts of Information Theory and how the implementations differ across the different Deep Learning Frameworks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found