DARC: Differentiable ARchitecture Compression

Singh, Shashank, Khetan, Ashish, Karnin, Zohar

May-20-2019–arXiv.org Machine Learning

In many learning situations, resources at inference time are significantly more constrained than resources at training time. This paper studies a general paradigm, called Differentiable ARchitecture Compression (DARC), that combines model compression and architecture search to learn models that are resource-efficient at inference time. Given a resource-intensive base architecture, DARC utilizes the training data to learn which sub-components can be replaced by cheaper alternatives. The high-level technique can be applied to any neural architecture, and we report experiments on state-of-the-art convolutional neural networks for image classification. For a WideResNet with $97.2\%$ accuracy on CIFAR-10, we improve single-sample inference speed by $2.28\times$ and memory footprint by $5.64\times$, with no accuracy loss. For a ResNet with $79.15\%$ Top1 accuracy on ImageNet, we improve batch inference speed by $1.29\times$ and memory footprint by $3.57\times$ with $1\%$ accuracy loss. We also give theoretical Rademacher complexity bounds in simplified cases, showing how DARC avoids overfitting despite over-parameterization.

artificial intelligence, darc, machine learning, (15 more...)

arXiv.org Machine Learning

May-20-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)

Genre:
- Research Report (1.00)

Industry:
- Education (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found