Token Masking Improves Transformer-Based Text Classification