Reviews: XNAS: Neural Architecture Search with Expert Advice

Neural Information Processing Systems 

I think the empirical evidence for exponentiated gradient descent ( wipeout) in NAS is valuable for the community. I would suggest that the authors clearly state the limitations associated with the analysis.] Summary: The proposed approach for NAS treats architecture choices, i.e., intra-cell node connections as in DARTS (Liu et al., 2018), as selection among "experts". The expert weights are found via EG descent (Kivinen & Warmuth, 1997) that somehow utilizes the standard "back-propagated loss gradient" (line 113) to perform the multiplicative weight updates. It's at this point that I had trouble following the theoretical analysis given that the EG algorithm requires a loss function on the expert prediction to be specified for the development of a regret bound.