Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

Gambella, Matteo, Pittorino, Fabrizio, Roveri, Manuel

arXiv.org Artificial Intelligence 

Neural Architecture Search (NAS) has emerged as a powerful paradigm in machine learning, offering the potential to automatically identify optimal neural network (NN) architectures for a given task [1]. In recent years, NAS has gained broad attention due to its versatility and applicability in scenarios where computational or hardware constraints demand efficient and specialized models, such as mobile devices or edge computing environments [2, 3]. Fundamentally, NAS can be framed as a discrete optimization process over a vast space of neural architectures. Early approaches relied on methods like genetic algorithms [4] and reinforcement learning [5]. However, the high computational cost associated with these methods motivated the development of more efficient strategies, resulting in the introduction of differentiable relaxations of the problem, such as Differentiable Architecture Search (DARTS) [6] and its numerous variants [7, 8, 9, 10, 11, 12, 13], which offer a more tractable way to navigate large architecture spaces. These methods were also promising in terms of performance, making them increasingly popular in the field. While considerable research efforts have been devoted to understanding the geometry of neural network loss landscapes in weight space [14, 15, 16, 17, 18], the precise geometry of architecture spaces remains largely underexplored [19, 20]. A deeper understanding of architecture geometry is crucial for designing more effective NAS algorithms, and for gaining insights into both the nature of the neural architecture optimization problem and the fundamental question of why certain architectures generalize better than others. In this work, we shed light on these questions by focusing on two representative differentiable NAS search spaces: the NAS-Bench-201 benchmark dataset [21] and the DARTS search space [6].