Problem-Independent Architectures
Review for NeurIPS paper: Semi-Supervised Neural Architecture Search
The paper proposes an interesting semisupervised approach to neural architecture search: Using architecture accuracy prediction function to to train the controller (architecture generator), and shows that such approach yields efficiency improvements. Reviewers generally agree on simplicity of this method and good experimental evaluation. Reviewers 3, 4 point out a number of missing comparisons however many of these are addressed in the rebuttal. It would also be good to understand why this method work, since as reviewer points out, no new information is added by the evaluation network - which on the other hand makes the experimental confirmation interesting. Overall this is an interesting and simple method with good evaluation and results.
Review for NeurIPS paper: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding
I could not see a strong motivation for explicitly enforcing sparsity on architecture parameters. This is because there are already many works trying to decouple the dependency of evaluating sub-networks on the training of supernet (i.e., making the correlation higher). This means that we have ways to explicitly decouple the network evaluation with supernet training without adding a sparsity regularizaiton. As far as I know, weight-sharing methods require the BN to be re-calculated [1] to properly measure the Kendall correlation. Other works that can reduce the gap between supernet and sub-networks (e.g.
Review for NeurIPS paper: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding
Four knowledgeable reviewers support acceptance for the contributions. Reviewers find that i) using sparse coding to solve the gap issue in NAS is novel and promising. The formulation and notations are neat. There is also a performance improvement in the one-stage framework. V) the paper is well-organized and easy to understand.
Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection
Recently, Neural Architecture Search has achieved great success in large-scale image classification. In contrast, there have been limited works focusing on architecture search for object detection, mainly because the costly ImageNet pretraining is always required for detectors. Training from scratch, as a substitute, demands more epochs to converge and brings no computation saving. To overcome this obstacle, we introduce a practical neural architecture transformation search(NATS) algorithm for object detection in this paper. Instead of searching and constructing an entire network, NATS explores the architecture space on the base of existing network and reusing its weights.
Review for NeurIPS paper: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Additional Feedback: Overall I think this paper is strong enough to recommend acceptance, the ideas are interesting and well motivated and the evaluation across benchmarks is reasonably thorough. Misc questions: - for the GCN, were alternatives to a global node considered? For example, it is common to see pooling across all nodes used to get a final embedding - how was 100 decided upon for the number of candidates to test at once? It would be interesting to see how changing this number changes the sampling efficiency/quality/runtime of the search - were weights preserved across sampling rounds as in ENAS or reinitialized each time? the trade-off/reliabilty in weight sharing in this case seems like it would be a bit different than the impact of weight sharing when considering a simultaneous pool of candidates - is it possible to clarify the EA used to produced candidates, there wasn't too much discussion on why it was used and the degree to which it helped over randomly sampling candidates - the correlations reported in Table 1 are good, but seems like it would be useful to quantify the quality of the model's scoring estimates as the search progresses, that is, at initialization it is guiding the search having only seen a smaller pool of architectures, how good is the correlation at the beginnning and how does it improve over the course of the search? If the search were run again from scratch, how consistent would it be?
Review for NeurIPS paper: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS
Overall, the reviewers found the key ideas in the paper novel and well-motivated. I support the reviewers' request for ablation studies to better disentangle the relative contribution of different components and the impact of different hyperparameters. Finally, please include standard deviations in table 2. They are readily available for the methods you are comparing against and the differences between methods are sufficiently small that it would be good to have an idea of variation across seeds.
Review for NeurIPS paper: Adapting Neural Architectures Between Domains
In AdaptNAS, a domain discriminator is used to approximate the domain discrepancy, which might introduce a certain amount of computation overhead. In particular, [23] performs consistently better than the proposed method while only searching in CIFAR-10. More importantly, it seems like there is no ablation study between using L_d (domain adaptation loss) or not. This makes it difficult to identify whether the performance is caused by using training data from both domains (L_S, L_T) or by the domain adaptation loss (L_d), which is the main contribution. However, it performs even better than most of the other settings where L_d presents (alpha 0).
Review for NeurIPS paper: Adapting Neural Architectures Between Domains
This paper focuses on the intersection between neural architecture search and domain adaptation. The proposal is to minimize the cross-domain generalization gap that generally exists in current neural architecture search (NAS) methods with proxy tasks. The philosophy behind sounds quite interesting to me. Namely, instead of directly using the target dataset for searching, which suffers from the high computation cost, the authors propose to improve the generalizability of neural architectures by leveraging a small portion of target samples via a domain adaptation technique. This philosophy leads to a novel algorithm design I have never seen, i.e., AdaptNAS.
Reviews: XNAS: Neural Architecture Search with Expert Advice
I think the empirical evidence for exponentiated gradient descent ( wipeout) in NAS is valuable for the community. I would suggest that the authors clearly state the limitations associated with the analysis.] Summary: The proposed approach for NAS treats architecture choices, i.e., intra-cell node connections as in DARTS (Liu et al., 2018), as selection among "experts". The expert weights are found via EG descent (Kivinen & Warmuth, 1997) that somehow utilizes the standard "back-propagated loss gradient" (line 113) to perform the multiplicative weight updates. It's at this point that I had trouble following the theoretical analysis given that the EG algorithm requires a loss function on the expert prediction to be specified for the development of a regret bound.
Reviews: XNAS: Neural Architecture Search with Expert Advice
The reviewers appreciated the good empirical results and theoretical analysis. This paper proposes to treat NAS problems as a selection problem among experts. Over time, it eliminates underperforming experts with a wipe-out step. As two of the reviewers pointed out, the theoretical analysis is interesting (and rare, in this type of paper), but it would be good to more explicitly spell out when and why the assumptions hold. Empirical performance seems good, but the authors should include error bars for at least the CIFAR-10 experiments and ideally the ImageNet ones as well.