Goto

Collaborating Authors

 architecture space


BridgetheGapBetweenArchitectureSpacesviaA Cross-DomainPredictor

Neural Information Processing Systems

Neural Architecture Search (NAS) can automatically design promising neural architectures without artificial experience. Though itachievesgreat success, prohibitively high search cost is required to find a high-performance architecture, whichblocksitspractical implementation.


Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor

Neural Information Processing Systems

Neural Architecture Search (NAS) can automatically design promising neural architectures without artificial experience. Though it achieves great success, prohibitively high search cost is required to find a high-performance architecture, which blocks its practical implementation. Neural predictor can directly evaluate the performance of neural networks based on their architectures and thereby save much budget. However, existing neural predictors require substantial annotated architectures trained from scratch, which still consume many computational resources. To solve this issue, we propose a Cross-Domain Predictor (CDP), which is trained based on the existing NAS benchmark datasets (e.g., NAS-Bench-101), but can be used to find high-performance architectures in large-scale search spaces. Particularly, we propose a progressive subspace adaptation strategy to address the domain discrepancy between the source architecture space and the target space. Considering the large difference between two architecture spaces, an assistant space is developed to smooth the transfer process. Compared with existing NAS methods, the proposed CDP is much more efficient. For example, CDP only requires the search cost of 0.1 GPU Days to find architectures with 76.9% top-1 accuracy on ImageNet and 97.51% on CIFAR-10.




ESM: A Framework for Building Effective Surrogate Models for Hardware-Aware Neural Architecture Search

Nasir, Azaz-Ur-Rehman, Shoaib, Samroz Ahmad, Hanif, Muhammad Abdullah, Shafique, Muhammad

arXiv.org Artificial Intelligence

Hardware-aware Neural Architecture Search (NAS) is one of the most promising techniques for designing efficient Deep Neural Networks (DNNs) for resource-constrained devices. Surrogate models play a crucial role in hardware-aware NAS as they enable efficient prediction of performance characteristics (e.g., inference latency and energy consumption) of different candidate models on the target hardware device. In this paper, we focus on building hardware-aware latency prediction models. We study different types of surrogate models and highlight their strengths and weaknesses. We perform a systematic analysis to understand the impact of different factors that can influence the prediction accuracy of these models, aiming to assess the importance of each stage involved in the model designing process and identify methods and policies necessary for designing/training an effective estimation model, specifically for GPU-powered devices. Based on the insights gained from the analysis, we present a holistic framework that enables reliable dataset generation and efficient model generation, considering the overall costs of different stages of the model generation pipeline.


Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Gambella, Matteo, Pittorino, Fabrizio, Roveri, Manuel

arXiv.org Artificial Intelligence

Neural Architecture Search (NAS) has emerged as a powerful paradigm in machine learning, offering the potential to automatically identify optimal neural network (NN) architectures for a given task [1]. In recent years, NAS has gained broad attention due to its versatility and applicability in scenarios where computational or hardware constraints demand efficient and specialized models, such as mobile devices or edge computing environments [2, 3]. Fundamentally, NAS can be framed as a discrete optimization process over a vast space of neural architectures. Early approaches relied on methods like genetic algorithms [4] and reinforcement learning [5]. However, the high computational cost associated with these methods motivated the development of more efficient strategies, resulting in the introduction of differentiable relaxations of the problem, such as Differentiable Architecture Search (DARTS) [6] and its numerous variants [7, 8, 9, 10, 11, 12, 13], which offer a more tractable way to navigate large architecture spaces. These methods were also promising in terms of performance, making them increasingly popular in the field. While considerable research efforts have been devoted to understanding the geometry of neural network loss landscapes in weight space [14, 15, 16, 17, 18], the precise geometry of architecture spaces remains largely underexplored [19, 20]. A deeper understanding of architecture geometry is crucial for designing more effective NAS algorithms, and for gaining insights into both the nature of the neural architecture optimization problem and the fundamental question of why certain architectures generalize better than others. In this work, we shed light on these questions by focusing on two representative differentiable NAS search spaces: the NAS-Bench-201 benchmark dataset [21] and the DARTS search space [6].


Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor

Neural Information Processing Systems

Neural Architecture Search (NAS) can automatically design promising neural architectures without artificial experience. Though it achieves great success, prohibitively high search cost is required to find a high-performance architecture, which blocks its practical implementation. Neural predictor can directly evaluate the performance of neural networks based on their architectures and thereby save much budget. However, existing neural predictors require substantial annotated architectures trained from scratch, which still consume many computational resources. To solve this issue, we propose a Cross-Domain Predictor (CDP), which is trained based on the existing NAS benchmark datasets (e.g., NAS-Bench-101), but can be used to find high-performance architectures in large-scale search spaces.


HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models

Zhou, Yu, Wu, Xingyu, Wu, Jibin, Feng, Liang, Tan, Kay Chen

arXiv.org Artificial Intelligence

Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.


Efficient and Joint Hyperparameter and Architecture Search for Collaborative Filtering

Wen, Yan, Gao, Chen, Yi, Lingling, Qiu, Liwei, Wang, Yaqing, Li, Yong

arXiv.org Artificial Intelligence

Automated Machine Learning (AutoML) techniques have recently been introduced to design Collaborative Filtering (CF) models in a data-specific manner. However, existing works either search architectures or hyperparameters while ignoring the fact they are intrinsically related and should be considered together. This motivates us to consider a joint hyperparameter and architecture search method to design CF models. However, this is not easy because of the large search space and high evaluation cost. To solve these challenges, we reduce the space by screening out usefulness yperparameter choices through a comprehensive understanding of individual hyperparameters. Next, we propose a two-stage search algorithm to find proper configurations from the reduced space. In the first stage, we leverage knowledge from subsampled datasets to reduce evaluation costs; in the second stage, we efficiently fine-tune top candidate models on the whole dataset. Extensive experiments on real-world datasets show better performance can be achieved compared with both hand-designed and previous searched models. Besides, ablation and case studies demonstrate the effectiveness of our search framework.


Visual Analysis of Neural Architecture Spaces for Summarizing Design Principles

Yuan, Jun, Liu, Mengchen, Tian, Fengyuan, Liu, Shixia

arXiv.org Artificial Intelligence

Recent advances in artificial intelligence largely benefit from better neural network architectures. These architectures are a product of a costly process of trial-and-error. To ease this process, we develop ArchExplorer, a visual analysis method for understanding a neural architecture space and summarizing design principles. The key idea behind our method is to make the architecture space explainable by exploiting structural distances between architectures. We formulate the pairwise distance calculation as solving an all-pairs shortest path problem. To improve efficiency, we decompose this problem into a set of single-source shortest path problems. The time complexity is reduced from O(kn^2N) to O(knN). Architectures are hierarchically clustered according to the distances between them. A circle-packing-based architecture visualization has been developed to convey both the global relationships between clusters and local neighborhoods of the architectures in each cluster. Two case studies and a post-analysis are presented to demonstrate the effectiveness of ArchExplorer in summarizing design principles and selecting better-performing architectures.