BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching

Open in new window