Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Open in new window