Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

Open in new window