L 2 M: Mutual Information Scaling Law for Long-Context Language Modeling

Open in new window