Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning