Why can neural language models solve next-word prediction? A mathematical perspective

Open in new window