Large Scale Language Modeling: Converging on 40GB of Text in Four Hours