Pipeline Parallelism is All You Need for Optimized Early-Exit Based Self-Speculative Decoding

Open in new window