A new area in artificial intelligence involves using algorithms to automatically design machine-learning systems known as neural networks, which are more accurate and efficient than those developed by human engineers. But this so-called neural architecture search (NAS) technique is computationally expensive. A state-of-the-art NAS algorithm recently developed by Google to run on a squad of graphical processing units (GPUs) took 48,000 GPU hours to produce a single convolutional neural network, which is used for image classification and detection tasks. Google has the wherewithal to run hundreds of GPUs and other specialized hardware in parallel, but that's out of reach for many others. In a paper being presented at the International Conference on Learning Representations in May, MIT researchers describe an NAS algorithm that can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms -- when run on a massive image dataset -- in only 200 GPU hours, which could enable far broader use of these types of algorithms.
Efficient deep learning computing requires algorithm and hardware co-design to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the algorithm makes the design space much larger: it's not only about designing the hardware but also about how to tweak the algorithm to best fit the hardware. Human engineers can hardly exhaust the design space by heuristics. It's labor consuming and sub-optimal. We propose design automation techniques for efficient neural networks. We investigate automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization. We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.
Model-based compression is an effective, facilitating, and expanded model of neural network models with limited computing and low power. However, conventional models of compression techniques utilize crafted features [2,3,12] and explore specialized areas for exploration and design of large spaces in terms of size, speed, and accuracy, which usually have returns Less and time is up. This paper will effectively analyze deep auto compression (ADC) and reinforcement learning strength in an effective sample and space design, and improve the compression quality of the model. The results of compression of the advanced model are obtained without any human effort and in a completely automated way. With a 4- fold reduction in FLOP, the accuracy of 2.8% is higher than the manual compression model for VGG-16 in ImageNet.
Applying deep neural networks (DNNs) in mobile and safety-critical systems, such as autonomous vehicles, demands a reliable and efficient execution on hardware. Optimized dedicated hardware accelerators are being developed to achieve this. However, the design of efficient and reliable hardware has become increasingly difficult, due to the increased complexity of modern integrated circuit technology and its sensitivity against hardware faults, such as random bit-flips. It is thus desirable to exploit optimization potential for error resilience and efficiency also at the algorithmic side, e.g. by optimizing the architecture of the DNN. Since there are numerous design choices for the architecture of DNNs, with partially opposing effects on the preferred characteristics (such as small error rates at low latency), multi-objective optimization strategies are necessary. In this paper, we develop an evolutionary optimization technique for the automated design of hardware-optimized DNN architectures. For this purpose, we derive a set of easily computable objective functions, which enable the fast evaluation of DNN architectures with respect to their hardware efficiency and error resilience solely based on the network topology. We observe a strong correlation between predicted error resilience and actual measurements obtained from fault injection simulations. Keywords Neural Network Hardware · Error Resilience · Hardware Faults · Neural Architecture Search · Multi-Objective Optimization · AutoML 1 Introduction The application of deep neural networks (DNNs) in safety-critical perception systems, for example autonomous vehicles (A Vs), poses some challenges on the design of the underlying hardware platforms. On the one hand, efficient and fast accelerators are needed, since DNNs for computer vision exhibit massive computational requirements . On the other hand, resilience against random hardware faults has to be ensured. In many driving scenarios, entering a fail-safe state is not sufficient, but fail-operational behavior and fault tolerance are required . However, fault tolerance techniques at the hardware level often entail large redundancy overheads in silicon area, latency, and power consumption. These overheads stand in contrast to the low-power and low-latency requirements of embedded real-time DNN accelerators. Reliability concerns in nanoscale integrated circuits, for instance soft errors in memory and logic, represent an additional challenge for the realization of fault tolerance mechanisms at the hardware level [2, 33, 36, 68, 83]. Moreover, techniques such as near-threshold computing  and approximate computing  are desirable to meet power constraints, but can further increase error rates.
Recent advancements in Neural Architecture Search(NAS) resulted in finding new state-of-the-art Artificial Neural Network (ANN) solutions for tasks like image classification, object detection, or semantic segmentation without substantial human supervision. In this paper, we focus on exploring NAS for a dense prediction task that is image denoising. Due to a costly training procedure, most NAS solutions for image enhancement rely on reinforcement learning or evolutionary algorithm exploration, which usually take weeks (or even months) to train. Therefore, we introduce a new efficient implementation of various superkernel techniques that enable fast (6-8 RTX2080 GPU hours) single-shot training of models for dense predictions. We demonstrate the effectiveness of our method on the SIDD+ benchmark for image denoising.