Tandem Transformers for Inference Efficient LLMs

Open in new window