weaknas
StrongerNASwithWeakerPredictors Appendix
We compare the effect of using different architecture encodings in in Table 2. We found when combined with CATE embedding [3], the performance of WeakNAS can be further improved, compared to WeakNAS baseline with adjacency matrix encoding used in [4]. Tofairly compare with BRP-NAS, we followthe exact same setting for our WeakNAS predictor, e.g., incorporating the same graph convolutional network (GCN) based predictor and using Top40 evaluation. As shown in Table 4, at 100 training samples, WeakNAS can achievecomparable performancetoBRP-NAS[5]. 2 Method #Train #Queries TestAcc.(%) We use uniform sampling due to a recent study [10] reveal that human-designed NAS search spaces usually contain a fair proportion of good models compared to random design spaces, for example, in Figure 9 of [10], it shows that in NASNet/Amoeba/PNAS/ENAS/DARTS search spaces, Top 5% of models only have a <1% performance gaptotheglobal optima.
Stronger NAS with Weaker Predictors
Neural Architecture Search (NAS) often trains and evaluates a large number of architectures. Recent predictor-based NAS approaches attempt to alleviate such heavy computation costs with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are far from accurate to locate top architectures due to the difficulty of fitting the huge search space. This paper reflects on a simple yet crucial question: if our final goal is to find the best architecture, do we really need to model the whole space well?. We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.
Stronger NAS with Weaker Predictors Appendix A Implementation details of baselines methods
We show the runtime comparison of WeakNAS and its BO variant in Table 1. Improvement (EI) acquisition function [2] being extremely costly.Method Predictors Config Train proxy model (s/arch) Derive new samples (s/arch) WeakNAS MLP 4 layers @1000 hidden 8. 59 10 We compare the effect of using different architecture encodings in in Table 2. As shown in Table 3. We conduct a controlled experiment on NAS-Bench-201 by varying number of samples. Evolution [1] in all three subsets, with better stability indicated by confidence intervals.
Stronger NAS with Weaker Predictors
Neural Architecture Search (NAS) often trains and evaluates a large number of architectures. Recent predictor-based NAS approaches attempt to alleviate such heavy computation costs with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are far from accurate to locate top architectures due to the difficulty of fitting the huge search space. This paper reflects on a simple yet crucial question: if our final goal is to find the best architecture, do we really need to model the whole space well?. We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.