APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs

Open in new window