Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning

Open in new window