AITopics

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsFeb-9-2026, 02:48:31 GMT

BridgetheGapBetweenArchitectureSpacesviaA Cross-DomainPredictor

Neural Architecture Search (NAS) can automatically design promising neural architectures without artificial experience. Though itachievesgreat success, prohibitively high search cost is required to find a high-performance architecture, whichblocksitspractical implementation.

architecture, artificial intelligence, machine learning, (18 more...)

Country:

Europe > Austria (0.04)
North America > United States > California (0.04)
North America > Canada > Ontario > Toronto (0.04)
Africa > Ethiopia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsDec-24-2025, 05:53:47 GMT

Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor

Neural Architecture Search (NAS) can automatically design promising neural architectures without artificial experience. Though it achieves great success, prohibitively high search cost is required to find a high-performance architecture, which blocks its practical implementation. Neural predictor can directly evaluate the performance of neural networks based on their architectures and thereby save much budget. However, existing neural predictors require substantial annotated architectures trained from scratch, which still consume many computational resources. To solve this issue, we propose a Cross-Domain Predictor (CDP), which is trained based on the existing NAS benchmark datasets (e.g., NAS-Bench-101), but can be used to find high-performance architectures in large-scale search spaces. Particularly, we propose a progressive subspace adaptation strategy to address the domain discrepancy between the source architecture space and the target space. Considering the large difference between two architecture spaces, an assistant space is developed to smooth the transfer process. Compared with existing NAS methods, the proposed CDP is much more efficient. For example, CDP only requires the search cost of 0.1 GPU Days to find architectures with 76.9% top-1 accuracy on ImageNet and 97.51% on CIFAR-10.

architecture, architecture space, cross-domain predictor, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)
Information Technology > Artificial Intelligence > Cognitive Science (0.83)

Neural Information Processing SystemsAug-15-2025, 16:55:35 GMT

aa85e45da94cb0d78853c50ba636a15a-Paper.pdf

architecture, arxiv preprint arxiv, nsa, (13 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsAug-14-2025, 23:11:46 GMT

572aaddf9ff774f7c1cf3d0c81c7185b-Paper-Conference.pdf

architecture, architecture search, predictor, (16 more...)

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Austria (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Nasir, Azaz-Ur-Rehman, Shoaib, Samroz Ahmad, Hanif, Muhammad Abdullah, Shafique, Muhammad

ESM: A Framework for Building Effective Surrogate Models for Hardware-Aware Neural Architecture Search

arXiv.org Artificial IntelligenceAug-5-2025

Hardware-aware Neural Architecture Search (NAS) is one of the most promising techniques for designing efficient Deep Neural Networks (DNNs) for resource-constrained devices. Surrogate models play a crucial role in hardware-aware NAS as they enable efficient prediction of performance characteristics (e.g., inference latency and energy consumption) of different candidate models on the target hardware device. In this paper, we focus on building hardware-aware latency prediction models. We study different types of surrogate models and highlight their strengths and weaknesses. We perform a systematic analysis to understand the impact of different factors that can influence the prediction accuracy of these models, aiming to assess the importance of each stage involved in the model designing process and identify methods and policies necessary for designing/training an effective estimation model, specifically for GPU-powered devices. Based on the insights gained from the analysis, we present a holistic framework that enables reliable dataset generation and efficient model generation, considering the overall costs of different stages of the model generation pipeline.

artificial intelligence, machine learning, predictor, (16 more...)

2508.01505

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Gambella, Matteo, Pittorino, Fabrizio, Roveri, Manuel

Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

arXiv.org Artificial IntelligenceMar-13-2025

Neural Architecture Search (NAS) has emerged as a powerful paradigm in machine learning, offering the potential to automatically identify optimal neural network (NN) architectures for a given task [1]. In recent years, NAS has gained broad attention due to its versatility and applicability in scenarios where computational or hardware constraints demand efficient and specialized models, such as mobile devices or edge computing environments [2, 3]. Fundamentally, NAS can be framed as a discrete optimization process over a vast space of neural architectures. Early approaches relied on methods like genetic algorithms [4] and reinforcement learning [5]. However, the high computational cost associated with these methods motivated the development of more efficient strategies, resulting in the introduction of differentiable relaxations of the problem, such as Differentiable Architecture Search (DARTS) [6] and its numerous variants [7, 8, 9, 10, 11, 12, 13], which offer a more tractable way to navigate large architecture spaces. These methods were also promising in terms of performance, making them increasingly popular in the field. While considerable research efforts have been devoted to understanding the geometry of neural network loss landscapes in weight space [14, 15, 16, 17, 18], the precise geometry of architecture spaces remains largely underexplored [19, 20]. A deeper understanding of architecture geometry is crucial for designing more effective NAS algorithms, and for gaining insights into both the nature of the neural architecture optimization problem and the fundamental question of why certain architectures generalize better than others. In this work, we shed light on these questions by focusing on two representative differentiable NAS search spaces: the NAS-Bench-201 benchmark dataset [21] and the DARTS search space [6].

accuracy, architecture, search space, (15 more...)

2503.10404

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Neural Information Processing SystemsOct-11-2024, 03:22:04 GMT

Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor

Neural Architecture Search (NAS) can automatically design promising neural architectures without artificial experience. Though it achieves great success, prohibitively high search cost is required to find a high-performance architecture, which blocks its practical implementation. Neural predictor can directly evaluate the performance of neural networks based on their architectures and thereby save much budget. However, existing neural predictors require substantial annotated architectures trained from scratch, which still consume many computational resources. To solve this issue, we propose a Cross-Domain Predictor (CDP), which is trained based on the existing NAS benchmark datasets (e.g., NAS-Bench-101), but can be used to find high-performance architectures in large-scale search spaces.

architecture, architecture space, cross-domain predictor, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Cognitive Science (0.87)

arXiv.org Artificial IntelligenceSep-27-2024

HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models

Zhou, Yu, Wu, Xingyu, Wu, Jibin, Feng, Liang, Tan, Kay Chen

Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.

machine learning, natural language, reinforcement learning, (18 more...)

2409.18893

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

arXiv.org Artificial IntelligenceJul-12-2023

Efficient and Joint Hyperparameter and Architecture Search for Collaborative Filtering

Wen, Yan, Gao, Chen, Yi, Lingling, Qiu, Liwei, Wang, Yaqing, Li, Yong

Automated Machine Learning (AutoML) techniques have recently been introduced to design Collaborative Filtering (CF) models in a data-specific manner. However, existing works either search architectures or hyperparameters while ignoring the fact they are intrinsically related and should be considered together. This motivates us to consider a joint hyperparameter and architecture search method to design CF models. However, this is not easy because of the large search space and high evaluation cost. To solve these challenges, we reduce the space by screening out usefulness yperparameter choices through a comprehensive understanding of individual hyperparameters. Next, we propose a two-stage search algorithm to find proper configurations from the reduced space. In the first stage, we leverage knowledge from subsampled datasets to reduce evaluation costs; in the second stage, we efficiently fine-tune top candidate models on the whole dataset. Extensive experiments on real-world datasets show better performance can be achieved compared with both hand-designed and previous searched models. Besides, ablation and case studies demonstrate the effectiveness of our search framework.

artificial intelligence, deep learning, machine learning, (18 more...)