training-free metric
Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search
Neural architecture search (NAS) has gained immense popularity owing to its ability to automate neural architecture design. A number of training-free metrics are recently proposed to realize NAS without training, hence making NAS more scalable. Despite their competitive empirical performances, a unified theoretical understanding of these training-free metrics is lacking. As a consequence, (a) the relationships among these metrics are unclear, (b) there is no theoretical interpretation for their empirical performances, and (c) there may exist untapped potential in existing training-free NAS, which probably can be unveiled through a unified theoretical understanding. To this end, this paper presents a unified theoretical analysis of gradient-based training-free NAS, which allows us to (a) theoretically study their relationships, (b) theoretically guarantee their generalization performances, and (c) exploit our unified theoretical understanding to develop a novel framework named hybrid NAS (HNAS) which consistently boosts training-free NAS in a principled way. Remarkably, HNAS can enjoy the advantages of both training-free (i.e., the superior search efficiency) and training-based (i.e., the remarkable search effectiveness) NAS, which we have demonstrated through extensive experiments.
Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search
Neural architecture search (NAS) has gained immense popularity owing to its ability to automate neural architecture design. A number of training-free metrics are recently proposed to realize NAS without training, hence making NAS more scalable. Despite their competitive empirical performances, a unified theoretical understanding of these training-free metrics is lacking. As a consequence, (a) the relationships among these metrics are unclear, (b) there is no theoretical interpretation for their empirical performances, and (c) there may exist untapped potential in existing training-free NAS, which probably can be unveiled through a unified theoretical understanding. To this end, this paper presents a unified theoretical analysis of gradient-based training-free NAS, which allows us to (a) theoretically study their relationships, (b) theoretically guarantee their generalization performances, and (c) exploit our unified theoretical understanding to develop a novel framework named hybrid NAS (HNAS) which consistently boosts training-free NAS in a principled way.
Efficient Multi-Objective Neural Architecture Search via Pareto Dominance-based Novelty Search
Neural Architecture Search (NAS) aims to automate the discovery of high-performing deep neural network architectures. Traditional objective-based NAS approaches typically optimize a certain performance metric (e.g., prediction accuracy), overlooking large parts of the architecture search space that potentially contain interesting network configurations. Furthermore, objective-driven population-based metaheuristics in complex search spaces often quickly exhaust population diversity and succumb to premature convergence to local optima. This issue becomes more complicated in NAS when performance objectives do not fully align with the actual performance of the candidate architectures, as is often the case with training-free metrics. While training-free metrics have gained popularity for their rapid performance estimation of candidate architectures without incurring computation-heavy network training, their effective incorporation into NAS remains a challenge. This paper presents the Pareto Dominance-based Novelty Search for multi-objective NAS with Multiple Training-Free metrics (MTF-PDNS). Unlike conventional NAS methods that optimize explicit objectives, MTF-PDNS promotes population diversity by utilizing a novelty score calculated based on multiple training-free performance and complexity metrics, thereby yielding a broader exploration of the search space. Experimental results on standard NAS benchmark suites demonstrate that MTF-PDNS outperforms conventional methods driven by explicit objectives in terms of convergence speed, diversity maintenance, architecture transferability, and computational costs.
- Oceania > Australia > Victoria > Melbourne (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- (15 more...)
SWAP-NAS: Sample-Wise Activation Patterns for Ultra-fast NAS
Peng, Yameng, Song, Andy, Fayek, Haytham M., Ciesielski, Vic, Chang, Xiaojun
Recent studies show that existing training-free metrics have several limitations, such as limited correlation and poor generalisation across different search spaces and tasks. Hence, we propose Sample-Wise Activation Patterns and its derivative, SWAP-Score, a novel high-performance training-free metric. It measures the expressivity of networks over a batch of input samples. The SWAP-Score is strongly correlated with ground-truth performance across various search spaces and tasks, outperforming 15 existing training-free metrics on NAS-Bench-101/201/301 and TransNAS-Bench-101. The SWAP-Score can be further enhanced by regularisation, which leads to even higher correlations in cell-based search space and enables model size control during the search. For example, Spearman's rank correlation coefficient between regularised SWAP-Score and CIFAR-100 validation accuracies on NAS-Bench-201 networks is 0.90, significantly higher than 0.80 from the second-best metric, NWOT. When integrated with an evolutionary algorithm for NAS, our SWAP-NAS achieves competitive performance on CIFAR-10 and ImageNet in approximately 6 minutes and 9 minutes of GPU time respectively. Performance evaluation of neural networks is critical, especially in Neural Architecture Search (NAS) which aims to automatically construct high-performing neural networks for a given task. The conventional approach evaluates candidate networks by feed-forward and back-propagation training. This process typically requires every candidate to be trained on the target dataset until convergence (Liu et al., 2019; Zoph & Le, 2017), and often leads to prohibitively high computational cost (Ren et al., 2022; White et al., 2023). To mitigate this cost, several alternatives have been introduced, such as performance predictors, architecture comparators and weight-sharing strategies. A divergent approach is the use of training-free metrics, also known as zero-cost proxies (Chen et al., 2021a; Lin et al., 2021; Lopes et al., 2021; Mellor et al., 2021; Mok et al., 2022; Tanaka et al., 2020b; Li et al., 2023). The aim is to eliminate the need for network training entirely. These metrics are either positively or negatively correlated with the networks' ground-truth performance.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (11 more...)
Robustifying and Boosting Training-Free Neural Architecture Search
He, Zhenfeng, Shu, Yao, Dai, Zhongxiang, Low, Bryan Kian Hsiang
Neural architecture search (NAS) has become a key component of AutoML and a standard tool to automate the design of deep neural networks. Recently, trainingfree NAS as an emerging paradigm has successfully reduced the search costs of standard training-based NAS by estimating the true architecture performance with only training-free metrics. Nevertheless, the estimation ability of these metrics typically varies across different tasks, making it challenging to achieve robust and consistently good search performance on diverse tasks with only a single trainingfree metric. Meanwhile, the estimation gap between training-free metrics and the true architecture performances limits training-free NAS to achieve superior performance. To address these challenges, we propose the robustifying and boosting training-free NAS (RoBoT) algorithm which (a) employs the optimized combination of existing training-free metrics explored from Bayesian optimization to develop a robust and consistently better-performing metric on diverse tasks, and (b) applies greedy search, i.e., the exploitation, on the newly developed metric to bridge the aforementioned gap and consequently to boost the search performance of standard training-free NAS further. Remarkably, the expected performance of our RoBoT can be theoretically guaranteed, which improves over the existing training-free NAS under mild conditions with additional interesting insights. Our extensive experiments on various NAS benchmark tasks yield substantial empirical evidence to support our theoretical results. Our code has been made publicly available at https://github.com/hzf1174/RoBoT. In recent years, deep learning has witnessed tremendous advancements, with applications ranging from computer vision (Caelles et al., 2017) to natural language processing (Vaswani et al., 2017). These advancements have been largely driven by sophisticated deep neural networks (DNNs) like AlexNet (Krizhevsky et al., 2017), VGG (Simonyan & Zisserman, 2014), and ResNet (He et al., 2016), which have been carefully designed and fine-tuned for specific tasks. However, crafting such networks often demands expert knowledge and a significant amount of trial and error. To mitigate this manual labor, the concept of neural architecture search (NAS) was introduced, aiming to automate the process of architecture design. While numerous training-based NAS algorithms have been proposed and have demonstrated impressive performance (Zoph & Le, 2016; Pham et al., 2018), they often require significant computational resources for performance estimation, as it entails training DNNs.
- Asia > Singapore (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Training-free Neural Architecture Search for RNNs and Transformers
Serianni, Aaron, Kalita, Jugal
Neural architecture search (NAS) has allowed for the automatic creation of new and effective neural network architectures, offering an alternative to the laborious process of manually designing complex architectures. However, traditional NAS algorithms are slow and require immense amounts of computing power. Recent research has investigated training-free NAS metrics for image classification architectures, drastically speeding up search algorithms. In this paper, we investigate training-free NAS metrics for recurrent neural network (RNN) and BERT-based transformer architectures, targeted towards language modeling tasks. First, we develop a new training-free metric, named hidden covariance, that predicts the trained performance of an RNN architecture and significantly outperforms existing training-free metrics. We experimentally evaluate the effectiveness of the hidden covariance metric on the NAS-Bench-NLP benchmark. Second, we find that the current search space paradigm for transformer architectures is not optimized for training-free neural architecture search. Instead, a simple qualitative analysis can effectively shrink the search space to the best performing architectures. This conclusion is based on our investigation of existing training-free metrics and new metrics developed from recent transformer pruning literature, evaluated on our own benchmark of trained BERT architectures. Ultimately, our analysis shows that the architecture search space and the training-free metric must be developed together in order to achieve effective results.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (7 more...)
Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search
Shu, Yao, Dai, Zhongxiang, Wu, Zhaoxuan, Low, Bryan Kian Hsiang
Neural architecture search (NAS) has gained immense popularity owing to its ability to automate neural architecture design. A number of training-free metrics are recently proposed to realize NAS without training, hence making NAS more scalable. Despite their competitive empirical performances, a unified theoretical understanding of these training-free metrics is lacking. As a consequence, (a) the relationships among these metrics are unclear, (b) there is no theoretical guarantee for their empirical performances and transferability, and (c) there may exist untapped potential in training-free NAS, which can be unveiled through a unified theoretical understanding. To this end, this paper presents a unified theoretical analysis of gradient-based training-free NAS, which allows us to (a) theoretically study their relationships, (b) theoretically guarantee their generalization performances and transferability, and (c) exploit our unified theoretical understanding to develop a novel framework named hybrid NAS (HNAS) which consistently boosts training-free NAS in a principled way. Interestingly, HNAS is able to enjoy the advantages of both training-free (i.e., superior search efficiency) and training-based (i.e., remarkable search effectiveness) NAS, which we have demonstrated through extensive experiments.