Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs

Open in new window