NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms

Open in new window