Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

Neural Information Processing Systems 

Research on scaling large language models (LLMs) has primarily focused on model parameters and training data size, overlooking the role of vocabulary size.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found