Not All Tokens Are What You Need for Pretraining

Apr-26-2026, 17:18:33 GMT–Neural Information Processing Systems

Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens.

artificial intelligence, machine learning, natural language, (6 more...)

Neural Information Processing Systems

Apr-26-2026, 17:18:33 GMT

Conferences Web Page

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.53)
  - Machine Learning (0.42)