Position-Aware Depth Decay Decoding ($D^3$): Boosting Large Language Model Inference Efficiency

Open in new window