SGD on Neural Networks Learns Functions of Increasing Complexity

Neural Information Processing Systems

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is retained'' throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model.

When Can Neural Networks Learn Connected Decision Regions? Machine Learning

Previous work has questioned the conditions under which the decision regions of a neural network are connected and further showed the implications of the corresponding theory to the problem of adversarial manipulation of classifiers. It has been proven that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. In this paper, we advance this important result by further developing the sufficient and necessary conditions under which the decision regions of a neural network are connected. We then apply our framework to overcome the limits of existing work and further study the capacity to learn connected regions of neural networks for a much wider class of activation functions including those widely used, namely ReLU, sigmoid, tanh, softlus, and exponential linear function.

A Brief History of Artificial Intelligence


The concept of AI has been around for many decades. British mathematician Alan Turing proposed in 1950 that it might be possible for machines to use information to reason, solve problems, and make decisions. His framework is the basis of the Turing Test, which says an AI system learns until indistinguishable from a human being in its ability to hold a conversation. In 1956, a team presented proof of concept on AI at the Dartmouth Summer Research Project on Artificial Intelligence. Also in the 1950s, a group of researchers at Massachusetts Institute of Technology (MIT) began work that would become the MIT Computer Science and Artificial Intelligence Laboratory.

What is Activation Function & Why you need it in Neural Networks


Hi guys are you working on neural networks or deep learning models and came across activation functions? And wondering what activation function is and why do we even need to use them in our deep learning models. In this post we are going to talk about activation function and why we should use activation functions in our neural networks. If you are planning to work in deep learning & build models for image, video or text recognition, You must have very good understanding of activation function and why they are required. I am going to cover this topic in a very non-mathematical and non-technical way so that you can relate to it and build an intuition.