Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

Open in new window