Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

Open in new window