In the last few years, deep neural networks have dominated pattern recognition. They blew the previous state of the art out of the water for many computer vision tasks. Voice recognition is also moving that way. But despite the results, we have to wonder… why do they work so well? In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it's a very elegant perspective. A neural network with a hidden layer has universality: given enough hidden units, it can approximate any function. This is a frequently quoted – and even more frequently, misunderstood and applied – theorem.