Review for NeurIPS paper: Why are Adaptive Methods Good for Attention Models?

Open in new window