NOVA: ABenchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
–Neural Information Processing Systems
In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Open-world recognition ensures that such systems remain robust as ever-emerging, previously unknown categories appear and must be addressed without retraining. Foundation and vision-language models are pretrained on large and diverse datasets with the expectation of broad generalization across domains, including medical imaging. However, benchmarking these models on test sets with only a few common outlier types silently collapses the evaluation back to a closed-set problem, masking failures on rare or truly novel conditions encountered in clinical use. We therefore present NOVA, a challenging, real-life evaluation-only benchmark of 900 brain MRI scans that span 281 rare pathologies and heterogeneous acquisition protocols. Each case includes rich clinical narratives and double-blinded expert bounding-box annotations. Together, these enable joint assessment of anomaly localisation, visual captioning, and diagnostic reasoning. Because NOVA is neverused for training, it serves as an extreme stress-test of out-of-distribution generalisation: models must bridge a distribution gap both in sample appearance and insemantic space.
Neural Information Processing Systems
Jun-20-2026, 01:59:25 GMT
- Country:
- North America > United States (0.67)
- Europe > Germany (0.46)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Health & Medicine
- Therapeutic Area > Neurology (1.00)
- Health Care Technology (1.00)
- Diagnostic Medicine > Imaging (1.00)
- Health & Medicine
- Technology:
- Information Technology
- Sensing and Signal Processing > Image Processing (0.93)
- Artificial Intelligence
- Vision (1.00)
- Natural Language > Large Language Model (1.00)
- Cognitive Science (0.88)
- Representation & Reasoning > Diagnosis (0.67)
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (0.93)
- Information Technology