Collaborating Authors

How To Optimise Deep Learning Models


Increasing number of parameters, latency, resources required to train etc have made working with deep learning tricky. Google researchers, in an extensive survey, have found common challenging areas for deep learning practitioners and suggested key checkpoints to mitigate these challenges. According to Gaurav Menghani of Google Research, if one were to deploy a model on smartphones where inference is constrained or expensive due to cloud servers, attention should be paid to inference efficiency. And if a large model has to be trained from scratch with limited training resources, models that are designed for training efficiency would be better off. According to Menghani, practitioners should aim to achieve pareto-optimality i.e. any model we choose should have the best of tradeoffs.

Decoding Efficient Deep Learning- Path to Smaller, Faster, and Better Models


In today's tech savvy world, it is famously said that predicting the future isn't magic, it's called Artificial Intelligence, and Data is essentially the new science behind it! The field of Machine Learning and Artificial Intelligence at large is evolving at a tremendous pace. Every researcher is trying to achieve the best models and beat benchmarks continuously. In the real world, when a model is deployed, a lot of focus has to be spent on analyzing whether deep learning models can be efficiently scaled for people who might not have millions of dollars to train the models and gigantic machines to deploy their models. Deep learning and Artificial Intelligence have empowered humans to essentially find a needle in a haystack, however let's dive deeper into how this science can be made more accessible by making it more efficient.

NASCaps: A Framework for Neural Architecture Search to Optimize the Accuracy and Hardware Efficiency of Convolutional Capsule Networks Machine Learning

Deep Neural Networks (DNNs) have made significant improvements to reach the desired accuracy to be employed in a wide variety of Machine Learning (ML) applications. Recently the Google Brain's team demonstrated the ability of Capsule Networks (CapsNets) to encode and learn spatial correlations between different input features, thereby obtaining superior learning capabilities compared to traditional (i.e., non-capsule based) DNNs. However, designing CapsNets using conventional methods is a tedious job and incurs significant training effort. Recent studies have shown that powerful methods to automatically select the best/optimal DNN model configuration for a given set of applications and a training dataset are based on the Neural Architecture Search (NAS) algorithms. Moreover, due to their extreme computational and memory requirements, DNNs are employed using the specialized hardware accelerators in IoT-Edge/CPS devices. In this paper, we propose NASCaps, an automated framework for the hardware-aware NAS of different types of DNNs, covering both traditional convolutional DNNs and CapsNets. We study the efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the NSGA-II algorithm). The proposed framework can jointly optimize the network accuracy and the corresponding hardware efficiency, expressed in terms of energy, memory, and latency of a given hardware accelerator executing the DNN inference. Besides supporting the traditional DNN layers, our framework is the first to model and supports the specialized capsule layers and dynamic routing in the NAS-flow. We evaluate our framework on different datasets, generating different network configurations, and demonstrate the tradeoffs between the different output metrics. We will open-source the complete framework and configurations of the Pareto-optimal architectures at

Searching Toward Pareto-Optimal Device-Aware Neural Architectures Machine Learning

Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performance in many tasks such as image classification and language understanding. However, most existing works only optimize for model accuracy and largely ignore other important factors imposed by the underlying hardware and devices, such as latency and energy, when making inference. In this paper, we first introduce the problem of NAS and provide a survey on recent works. Then we deep dive into two recent advancements on extending NAS into multiple-objective frameworks: MONAS and DPP-Net. Both MONAS and DPP-Net are capable of optimizing accuracy and other objectives imposed by devices, searching for neural architectures that can be best deployed on a wide spectrum of devices: from embedded systems and mobile devices to workstations. Experimental results are poised to show that architectures found by MONAS and DPP-Net achieves Pareto optimality w.r.t the given objectives for various devices.

Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces Machine Learning

This work presents DONNA (Distilling Optimal Neural Network Architectures), a novel pipeline for rapid neural architecture search and search space exploration, targeting multiple different hardware platforms and user scenarios. In DONNA, a search consists of three phases. First, an accuracy predictor is built for a diverse search space using blockwise knowledge distillation. This predictor enables searching across diverse macro-architectural network parameters such as layer types, attention mechanisms, and channel widths, as well as across micro-architectural parameters such as block repeats, kernel sizes, and expansion rates. Second, a rapid evolutionary search phase finds a Pareto-optimal set of architectures in terms of accuracy and latency for any scenario using the predictor and on-device measurements. Third, Pareto-optimal models can be quickly finetuned to full accuracy. With this approach, DONNA finds architectures that outperform the state of the art. In ImageNet classification, architectures found by DONNA are 20% faster than EfficientNet-B0 and MobileNetV2 on a Nvidia V100 GPU at similar accuracy and 10% faster with 0.5% higher accuracy than MobileNetV2-1.4x on a Samsung S20 smartphone. In addition to neural architecture search, DONNA is used for search-space exploration and hardware-aware model compression.