Scaling Embedding Layers in Language Models