Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding

Open in new window