Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs