Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Open in new window