ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces