Collaborating Authors

Understanding Attention and Generalization in Graph Neural Networks

Neural Information Processing Systems

We aim to better understand attention over nodes in graph neural networks (GNNs) and identify factors influencing its effectiveness. We particularly focus on the ability of attention GNNs to generalize to larger, more complex or noisy graphs. Motivated by insights from the work on Graph Isomorphism Networks, we design simple graph reasoning tasks that allow us to study attention in a controlled environment. We find that under typical conditions the effect of attention is negligible or even harmful, but under certain conditions it provides an exceptional gain in performance of more than 60% in some of our classification tasks. Satisfying these conditions in practice is challenging and often requires optimal initialization or supervised training of attention.

Flipboard on Flipboard


It's not easy to train a neural network. Even if they're not difficult to implement, it can take hours to get them ready no matter how much computing power you can use. OpenAI researchers may have a better solution: forgetting many of the usual rules. They've developed an evolution strategy (no, it doesn't relate much to biological evolution) that promises more powerful AI systems. Rather than use standard reinforcement training, they create a "black box" where they forget that the environment and neural networks are even involved.

Evolutionary computation will drive the future of creative AI


AI is arguably the biggest tech topic of 2018. From Google Duplex's human imitations and Spotify's song recommendations to Uber's self-driving cars and the Pentagon's use of GoogleAI, the technology seems to offer everything to everyone. You could say AI has become synonymous with progress via computing. However, not all AI is created equal, and for AI to fulfill its many promises, it needs to be creative. Let's start by addressing what I mean by "creative."

Distribution of the search of evolutionary product unit neural networks for classification Artificial Intelligence

This paper deals with the distributed processing in the search for an optimum classification model using evolutionary product unit neural networks. For this distributed search we used a cluster of computers. Our objective is to obtain a more efficient design than those net architectures which do not use a distributed process and which thus result in simpler designs. In order to get the best classification models we use evolutionary algorithms to train and design neural networks, which require a very time consuming computation. The reasons behind the need for this distribution are various. It is complicated to train this type of nets because of the difficulty entailed in determining their architecture due to the complex error surface. On the other hand, the use of evolutionary algorithms involves running a great number of tests with different seeds and parameters, thus resulting in a high computational cost

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Neural Information Processing Systems

We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a \textit{neural tangent random feature} (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n {-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.