Collaborative Large Language Model Inference via Resource-Aware Parallel Speculative Decoding

Open in new window