Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?

Mar-18-2026, 17:02:44 GMT–Neural Information Processing Systems

How can we test AI performance? This question seems trivial, but it isn't. Standard benchmarks often have problems such as in-distribution and small-size test sets, oversimplified metrics, unfair comparisons, and short-term outcome pressure. As a consequence, good performance on standard benchmarks does not guarantee success in real-world scenarios. To address these problems, we present Touchstone, a large-scale collaborative segmentation benchmark of 9 types of abdominal organs.

algorithm, artificial intelligence, proceedings, (10 more...)

Neural Information Processing Systems

Mar-18-2026, 17:02:44 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (1.00)