Learning Versatile Optimizers on a Compute Diet