Collaborating Authors

Researchers Build AI That Builds AI


A hypernetwork aims to find the best deep neural network architecture to solve a given task. Boris Knyazev of the University of Guelph in Ontario and his colleagues have designed and trained a "hypernetwork" that could speed up the training of neural networks. Given a new, untrained deep neural network designed for some task, the hypernetwork predicts the parameters for the new network in fractions of a second, and in theory could make training unnecessary. The work may also have deeper theoretical implications. The name outlines the approach.

Continual learning with hypernetworks Artificial Intelligence

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, we present a novel approach based on task-conditioned hypernetworks, i.e., networks that generate the weights of a target model based on task identity. Continual learning (CL) is less difficult for this class of models thanks to a simple key observation: instead of relying on recalling the input-output relations of all previously seen data, task-conditioned hypernetworks only require rehearsing previous weight realizations, which can be maintained in memory using a simple regularizer. Besides achieving good performance on standard CL benchmarks, additional experiments on long task sequences reveal that task-conditioned hypernetworks display an unprecedented capacity to retain previous memories. Notably, such long memory lifetimes are achieved in a compressive regime, when the number of trainable weights is comparable or smaller than target network size. We provide insight into the structure of low-dimensional task embedding spaces (the input space of the hypernetwork) and show that task-conditioned hypernetworks demonstrate transfer learning properties. Finally, forward information transfer is further supported by empirical results on a challenging CL benchmark based on the CIFAR-10/100 image datasets.

Latest Research Based On AI Building AI Models


Artificial intelligence (AI) is primarily a math problem. We finally have enough data and processing capacity to take full advantage of deep neural networks, a type of AI that learns to discover patterns in data, when they began to surpass standard algorithms 10 years ago. Today's neural networks are even more data and processing power-hungry. Training them necessitates fine-tuning the values of millions, if not billions, of parameters that describe these networks and represent the strength of artificial neuron connections. The goal is to discover almost perfect values for them, known as optimization, but training the networks to get there is difficult.

Graph HyperNetworks for Neural Architecture Search Machine Learning

Neural architecture search (NAS) automatically finds the best task-specific neural network topology, outperforming many manual architecture designs. However, it can be prohibitively expensive as the search requires training thousands of different networks, while each can last for hours. In this work, we propose the Graph HyperNetwork (GHN) to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network. GHNs model the topology of an architecture and therefore can predict network performance more accurately than regular hypernetworks and premature early stopping. To perform NAS, we randomly sample architectures and use the validation accuracy of networks with GHN generated weights as the surrogate search signal. GHNs are fast - they can search nearly 10 faster than other random search methods on CIFAR-10 and ImageNet. GHNs can be further extended to the anytime prediction setting, where they have found networks with better speed-accuracy tradeoff than the state-of-the-art manual designs. The success of deep learning marks the transition from manual feature engineering to automated feature learning. However, designing effective neural network architectures requires expert domain knowledge and repetitive trial and error.

Language Modeling with Recurrent Highway Hypernetworks

Neural Information Processing Systems

Where the original RHN work primarily provides theoretical treatment of the subject, we demonstrate experimentally that RHNs benefit from far better gradient flow than LSTMs in addition to their improved task accuracy. The original hypernetworks work presents detailed experimental results but leaves several theoretical issues unresolved--we consider these in depth and frame several feasible solutions that we believe will yield further gains in the future. We demonstrate that these approaches are complementary: by combining RHNs and hypernetworks, we make a significant improvement over current state-of-the-art character-level language modeling performance on Penn Treebank while relying on much simpler regularization. Finally, we argue for RHNs as a drop-in replacement for LSTMs (analogous to LSTMs for vanilla RNNs) and for hypernetworks as a de-facto augmentation (analogous to attention) for recurrent architectures. Papers published at the Neural Information Processing Systems Conference.