Goto

Collaborating Authors

 Problem-Independent Architectures


Review for NeurIPS paper: Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?

Neural Information Processing Systems

In the paper authors created an unsupervised learning method to embeds architectures in latent space and showed through experiments that the representations formed result in improved downstream performance compared to training with supervised objective jointly. The idea is important and the analysis is sound. The paper could be improved by analysing more diverse space of architectures than ResNet like blocks, as well as other suggestions given by the reviewers.


Review for NeurIPS paper: Neural Architecture Generator Optimization

Neural Information Processing Systems

The authors showed results on a larger search space (with learnable stage ratios), which worked reasonable well (of course at the cost of much longer training time). While still some other design choices could be optimized, I do think this is an interesting and novel approach that could open up many future research and advance the field of NAS. Thus I think this paper should be accepted and I'm keeping my rating.


Review for NeurIPS paper: Semi-Supervised Neural Architecture Search

Neural Information Processing Systems

The paper proposes an interesting semisupervised approach to neural architecture search: Using architecture accuracy prediction function to to train the controller (architecture generator), and shows that such approach yields efficiency improvements. Reviewers generally agree on simplicity of this method and good experimental evaluation. Reviewers 3, 4 point out a number of missing comparisons however many of these are addressed in the rebuttal. It would also be good to understand why this method work, since as reviewer points out, no new information is added by the evaluation network - which on the other hand makes the experimental confirmation interesting. Overall this is an interesting and simple method with good evaluation and results.


Supplementary Material of ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding, Shan You

Neural Information Processing Systems

The CIFAR-10 dataset has 60,000 colored images in 10 classes, with 50,000 images for training and 10,000 images for testing. The images are normalized by mean and standard deviation. As convention, we perform the data augmentation by padding each image 4 pixels filled with 0 on each side and then randomly cropping a 32 32 patch from each image or its horizontal flip. The ImageNet dataset contains 1.2 million training images, 50,000 validation images, and 100,000 test images in 1,000 classes. We adopt the standard data augmentation for training.


Review for NeurIPS paper: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding

Neural Information Processing Systems

I could not see a strong motivation for explicitly enforcing sparsity on architecture parameters. This is because there are already many works trying to decouple the dependency of evaluating sub-networks on the training of supernet (i.e., making the correlation higher). This means that we have ways to explicitly decouple the network evaluation with supernet training without adding a sparsity regularizaiton. As far as I know, weight-sharing methods require the BN to be re-calculated [1] to properly measure the Kendall correlation. Other works that can reduce the gap between supernet and sub-networks (e.g.


Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection

Neural Information Processing Systems

Recently, Neural Architecture Search has achieved great success in large-scale image classification. In contrast, there have been limited works focusing on architecture search for object detection, mainly because the costly ImageNet pretraining is always required for detectors. Training from scratch, as a substitute, demands more epochs to converge and brings no computation saving. To overcome this obstacle, we introduce a practical neural architecture transformation search(NATS) algorithm for object detection in this paper. Instead of searching and constructing an entire network, NATS explores the architecture space on the base of existing network and reusing its weights.


Review for NeurIPS paper: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

Neural Information Processing Systems

Additional Feedback: Overall I think this paper is strong enough to recommend acceptance, the ideas are interesting and well motivated and the evaluation across benchmarks is reasonably thorough. Misc questions: - for the GCN, were alternatives to a global node considered? For example, it is common to see pooling across all nodes used to get a final embedding - how was 100 decided upon for the number of candidates to test at once? It would be interesting to see how changing this number changes the sampling efficiency/quality/runtime of the search - were weights preserved across sampling rounds as in ENAS or reinitialized each time? the trade-off/reliabilty in weight sharing in this case seems like it would be a bit different than the impact of weight sharing when considering a simultaneous pool of candidates - is it possible to clarify the EA used to produced candidates, there wasn't too much discussion on why it was used and the degree to which it helped over randomly sampling candidates - the correlations reported in Table 1 are good, but seems like it would be useful to quantify the quality of the model's scoring estimates as the search progresses, that is, at initialization it is guiding the search having only seen a smaller pool of architectures, how good is the correlation at the beginnning and how does it improve over the course of the search? If the search were run again from scratch, how consistent would it be?


Review for NeurIPS paper: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS

Neural Information Processing Systems

Overall, the reviewers found the key ideas in the paper novel and well-motivated. I support the reviewers' request for ablation studies to better disentangle the relative contribution of different components and the impact of different hyperparameters. Finally, please include standard deviations in table 2. They are readily available for the methods you are comparing against and the differences between methods are sufficiently small that it would be good to have an idea of variation across seeds.


Adapting Neural Architectures Between Domains (Supplementary Material) Yanxi Li

Neural Information Processing Systems

This supplementary material consists of three parts, including the proofs of all lemmas, theorems and corollaries (Section A), details of the experiment setting (Section B) and some additional experiment results (Section C). A.1 Proof of Lemma 1 Lemma 1. [2] Let R be a representation function R: X Z, and D A.2 Proof of Theorem 2 Theorem 2. Let m be the size of Ũ By taking union bound of Eq. 7 over all h H By combining Theorem 2 and Lemma 3, we can derive the proof of Corollary 4. Let Ũ Finally, by applying the bound between the expected domain distance with the empirical domain distance according to [6], we can have Eq. B.1 NAS Search Space Following many previous works [3, 5, 7, 9, 10], we use the NASNet search space [10]. There are 2 kinds of cells in the search space, including normal cells and reduction cells. Normal cells use stride 1 and maintain the size of feature maps.


Review for NeurIPS paper: Adapting Neural Architectures Between Domains

Neural Information Processing Systems

In AdaptNAS, a domain discriminator is used to approximate the domain discrepancy, which might introduce a certain amount of computation overhead. In particular, [23] performs consistently better than the proposed method while only searching in CIFAR-10. More importantly, it seems like there is no ablation study between using L_d (domain adaptation loss) or not. This makes it difficult to identify whether the performance is caused by using training data from both domains (L_S, L_T) or by the domain adaptation loss (L_d), which is the main contribution. However, it performs even better than most of the other settings where L_d presents (alpha 0).