Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

Open in new window