Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Open in new window