StochasticArchitectures

Neural Information Processing Systems 

We take 1000 training images from CIFAR-10 as a fixed batch, randomly sample the neural architecture for inference, and computevar(µ) of the last BN layer of a NSA and a NSA-i trained givenS = 5000architectures. Inthissection, wecalculate thetestaccuracyof200randomly sampled architectures based onthe vanilla NSA models trained under various spaces. A half of these architectures are seen during trainingwhiletheotherhalfnot.