Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput

Open in new window