Intrinsic Self-Supervision for Data Quality Audits Fabian Gröger, Alvaro Gonzalez-Jimenez
–Neural Information Processing Systems
Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors, leading to inaccurate estimates of model performance. In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, which significantly reduces human inspection effort, or a scoring problem, which allows for automated decisions based on score distributions. We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases.
Neural Information Processing Systems
Mar-26-2025, 16:26:09 GMT
- Country:
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Nuclear Medicine (0.67)
- Therapeutic Area
- Dermatology (1.00)
- Oncology (0.67)
- Information Technology (0.67)
- Health & Medicine
- Technology: