SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Open in new window