The evolution of MobileNets has laid a solid foundation for neural network application on the mobile end. With the latest MobileNetV3, neural architecture search again claimed its supremacy on network design. Till today all mobile methods mainly focus on CPU latency instead of GPU, the latter, however, has lower overhead and interference and is much preferred in the industry. To mitigate this gap, we propose the first Mobile GPU-Aware (MoGA) neural architecture search in order to be precisely tailored for real-world applications. Further, the ultimate objective to devise a mobile network lies in achieving better performance by maximizing the utilization of bounded resources. While urging higher capability and restraining time consumption, we unconventionally encourage increasing the number of parameters for higher representational power. Undoubtedly, these three forces are not reconcilable and we have to alleviate the tension by weighted evolution techniques. Lastly, we deliver our searched networks at a mobile scale that outperform MobileNetV3 under the similar latency constraints, i.e., MoGA-A achieves 75.9\% top-1 accuracy on ImageNet, MoGA-B meets 75.5\% which costs only 0.5ms more on mobile GPU than MobileNetV3, which scores 75.2\%. MoGA-C best attests GPU-awareness by reaching 75.3\% and being slower on CPU but faster on GPU. The models and test code is made available here https://github.com/xiaomi-automl/MoGA.
The ability to rank models by its real strength is the key to Neural Architecture Search. Traditional approaches adopt an incomplete training for such purpose which is still very costly. One-shot methods are thus devised to cut the expense by reusing the same set of weights. However, it is uncertain whether shared weights are truly effective. It is also unclear if a picked model is better because of its vigorous representational power or simply because it is overtrained. In order to remove the suspicion, we propose a novel idea called Fair Neural Architecture Search (FairNAS), in which a strict fairness constraint is enforced for fair inheritance and training. In this way, our supernet exhibits nice convergence and very high training accuracy. The performance of any sampled model loaded with shared weights from the supernet strongly correlates with that of stand-alone counterpart when trained fully. This result dramatically improves the searching efficiency, with a multi-objective reinforced evolutionary search backend, our pipeline generated a new set of state-of-the-art architectures on ImageNet: FairNAS-A attains 75.34% top-1 validation accuracy on ImageNet, FairNAS-B 75.10%, FairNAS-C 74.69%, even with lower multi-adds and/or fewer number of parameters compared with others. The models and their evaluation code are made publicly available online http://github.com/fairnas/FairNAS.
Can we reduce the search cost of Neural Architecture Search (NAS) from days down to only few hours? NAS methods automate the design of Convolutional Networks (ConvNets) under hardware constraints and they have emerged as key components of AutoML frameworks. However, the NAS problem remains challenging due to the combinatorially large design space and the significant search time (at least 200 GPU-hours). In this work, we alleviate the NAS search cost down to less than 3 hours, while achieving state-of-the-art image classification results under mobile latency constraints. We propose a novel differentiable NAS formulation, namely Single-Path NAS, that uses one single-path over-parameterized ConvNet to encode all architectural decisions based on shared convolutional kernel parameters, hence drastically decreasing the search overhead. Single-Path NAS achieves state-of-the-art top-1 ImageNet accuracy (75.62%), hence outperforming existing mobile NAS methods in similar latency settings (~80ms). In particular, we enhance the accuracy-runtime trade-off in differentiable NAS by treating the Squeeze-and-Excitation path as a fully searchable operation with our novel single-path encoding. Our method has an overall cost of only 8 epochs (24 TPU-hours), which is up to 5,000x faster compared to prior work. Moreover, we study how different NAS formulation choices affect the performance of the designed ConvNets. Furthermore, we exploit the efficiency of our method to answer an interesting question: instead of empirically tuning the hyperparameters of the NAS solver (as in prior work), can we automatically find the hyperparameter values that yield the desired accuracy-runtime trade-off? We open-source our entire codebase at: https://github.com/dstamoulis/single-path-nas.
The ability of accurately ranking candidate architectures is the key to the performance of neural architecture search~(NAS). One-shot NAS is proposed to cut the expense but shows inferior performance against conventional NAS and is not adequately stable. We find that the ranking correlation between architectures under one-shot training and the ones under stand-alone training is poor, which misleads the algorithm to discover better architectures. We conjecture that this is owing to the gaps between one-shot training and stand-alone complete training. In this work, we empirically investigate several main factors that lead to the gaps and so weak ranking correlation. We then propose NAO-V2 to alleviate such gaps where we: (1) Increase the average updates for individual architecture to a relatively adequate extent. (2) Encourage more updates for large and complex architectures than small and simple architectures to balance them by sampling architectures in proportion to their model sizes. (3) Make the one-shot training of the supernet independent at each iteration. Comprehensive experiments verify that our proposed method is effective and robust. It leads to a more stable search that all the top architectures perform well enough compared to baseline methods. The final discovered architecture shows significant improvements against baselines with a test error rate of 2.60% on CIFAR-10 and top-1 accuracy of 74.4% on ImageNet under the mobile setting. Code and model checkpoints are publicly available at https://github.com/renqianluo/NAO_pytorch.
Can we automatically design a Convolutional Network (ConvNet) with the highest image classification accuracy under the runtime constraint of a mobile device? Neural architecture search (NAS) has revolutionized the design of hardware-efficient ConvNets by automating this process. However, the NAS problem remains challenging due to the combinatorially large design space, causing a significant searching time (at least 200 GPU-hours). To alleviate this complexity, we propose Single-Path NAS, a novel differentiable NAS method for designing hardware-efficient ConvNets in less than 4 hours. Our contributions are as follows: 1. Single-path search space: Compared to previous differentiable NAS methods, Single-Path NAS uses one single-path over-parameterized ConvNet to encode all architectural decisions with shared convolutional kernel parameters, hence drastically decreasing the number of trainable parameters and the search cost down to few epochs. 2. Hardware-efficient ImageNet classification: Single-Path NAS achieves 74.96% top-1 accuracy on ImageNet with 79ms latency on a Pixel 1 phone, which is state-of-the-art accuracy compared to NAS methods with similar constraints (<80ms). 3. NAS efficiency: Single-Path NAS search cost is only 8 epochs (30 TPU-hours), which is up to 5,000x faster compared to prior work. 4. Reproducibility: Unlike all recent mobile-efficient NAS methods which only release pretrained models, we open-source our entire codebase at: https://github.com/dstamoulis/single-path-nas.