A Model, training, and dataset details All models are trained end-to-end with the Gumbel-Softmax [