Helix Parallelism: Rethinking Sharding Strategies for Interactive Multi-Million-Token LLM Decoding

Open in new window