Gradient Sparsification For Masked Fine-Tuning of Transformers