Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis