Review for NeurIPS paper: Evolving Normalization-Activation Layers

Neural Information Processing Systems 

The paper focuses on designing new neural architectures; it presents a new search space and new optimization criteria. The new search space includes tensor-to-tensor operators integrating activation and normalization functions; the criteria involve an early performance indicator (this is classical) and a stability indicator (this is new). The rebuttal addressed nearly all reviewers' concern: * about the significance of the performance gains; * about the generality of the approach when applied to other architectures; * about the fair evaluation (with a hold-out); * about the impact of the stability indicator (lesion study). The AC would like the computational cost of the evolution to be spelled out in the revised paper (beyond "a relatively large number of CPUs" ..); how many tournaments? As a suggestion, it might be interesting to see whether (and how) scale insensitivity (E.2) could be used as a 3rd rejection criterion.