What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)

Open in new window