Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding

Open in new window