SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Open in new window