NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms