Plato: Plan to Efficiently Decode for Large Language Model Inference

Open in new window