DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference

Open in new window