S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

Open in new window