FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Open in new window