AITopics | Holtham, Elliot

Learning Across Scales---Multiscale Methods for Convolution Neural Networks

Haber, Eldad (University of British Columbia, Vancouver, BC) | Ruthotto, Lars (Xtract Technologies, Vancouver, BC) | Holtham, Elliot (Emory University, Atlanta, GA) | Jun, Seong-Hwan (Xtract Technologies, Vancouver, BC)

AAAI ConferencesFeb-8-2018

In this work, we establish the relation between optimal control and training deep Convolution Neural Networks (CNNs). We show that the forward propagation in CNNs can be interpreted as a time-dependent nonlinear differential equation and learning can be seen as controlling the parameters of the differential equation such that the network approximates the data-label relation for given training data. Using this continuous interpretation, we derive two new methods to scale CNNs with respect to two different dimensions. The first class of multiscale methods connects low-resolution and high-resolution data using prolongation and restriction of CNN parameters inspired by algebraic multigrid techniques. We demonstrate that our method enables classifying high-resolution images using CNNs trained with low-resolution images and vice versa and warm-starting the learning process. The second class of multiscale methods connects shallow and deep networks and leads to new training strategies that gradually increase the depths of the CNN while re-using parameters for initializations.

cnn, deep learning, neural network, (17 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Chang, Bo (University of British Columbia, Xtract Technologies Inc.) | Meng, Lili (University of British Columbia, Xtract Technologies Inc.) | Haber, Eldad (University of British Columbia, Xtract Technologies Inc.) | Ruthotto, Lars (Emory University, Xtract Technologies Inc.) | Begert, David (Xtract Technologies Inc.) | Holtham, Elliot (Xtract Technologies Inc.)

AAAI ConferencesFeb-8-2018

Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. From this interpretation, we develop a theoretical framework on stability and reversibility of deep neural networks, and derive three reversible neural network architectures that can go arbitrarily deep in theory. The reversibility property allows a memory-efficient implementation, which does not need to store the activations for most hidden layers. Together with the stability of our architectures, this enables training deeper networks using only modest computational resources. We provide both theoretical analyses and empirical results. Experimental results demonstrate the efficacy of our architectures against several strong baselines on CIFAR-10, CIFAR-100 and STL-10 with superior or on-par state-of-the-art performance. Furthermore, we show our architectures yield superior results when trained using fewer training data.

architecture, deep learning, neural network, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Chang, Bo, Meng, Lili, Haber, Eldad, Ruthotto, Lars, Begert, David, Holtham, Elliot

arXiv.org Machine LearningNov-18-2017

Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. From this interpretation, we develop a theoretical framework on stability and reversibility of deep neural networks, and derive three reversible neural network architectures that can go arbitrarily deep in theory. The reversibility property allows a memory-efficient implementation, which does not need to store the activations for most hidden layers. Together with the stability of our architectures, this enables training deeper networks using only modest computational resources. We provide both theoretical analyses and empirical results. Experimental results demonstrate the efficacy of our architectures against several strong baselines on CIFAR-10, CIFAR-100 and STL-10 with superior or on-par state-of-the-art performance. Furthermore, we show our architectures yield superior results when trained using fewer training data.

architecture, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1709.03698

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback