Are Large Language Models Really Robust to Word-Level Perturbations?