Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models