Exploring Quantization for Efficient Pre-Training of Transformer Language Models