Efficient Large Language Model Inference with Neural Block Linearization

Open in new window