Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

Open in new window