RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos

Open in new window