VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

Open in new window