Adaptive Computation Pruning for the Forgetting Transformer

Open in new window