Inference acceleration for large language models using "stairs" assisted greedy generation

Open in new window