Inference acceleration for large language models using "stairs" assisted greedy generation