Optimal Memorization Capacity of Transformers

Open in new window