KAIROS: Scalable Model-Agnostic Data Valuation

Jun-18-2026, 22:22:58 GMT–Neural Information Processing Systems

Data valuation techniques quantify each training example's contribution to model performance, providing a principled basis for data cleaning, acquisition, and selection. Existing valuation methods remain inadequate: model-based techniques depend on a single fitted model and inherit its biases, while algorithm-based approaches like Data Shapley scale poorly due to their need to train multiple models. Recent work has proposed model-agnostic alternatives based on Wasserstein distance between the training set and a clean reference set, but exact computation is expensive and approximations often misrank examples. We introduce KAIROS, a model-agnostic framework that values examples by their contribution to the Maximum Mean Discrepancy (MMD) between the training set and a clean reference distribution. Unlike Wasserstein methods, MMD admits a closed-form solution that requires no approximations and is scalable to large datasets. Additionally, KAIROS enables efficient online valuation: adding a new batch of m examples requires only O(mN)computation to update all scores, compared to O(N2)in prior work where N is the training set size. Empirical evaluations on noise, mislabeling, and poisoning benchmarks show that KAIROS consistently outperforms state-of-the-art baselines in both accuracy and runtime. On ImageNet, KAIROS achieves up to 15 speedup over the fastest baseline while maintaining superior data valuation quality. Our results demonstrate that model-agnostic methods can match or exceed model-based approaches in performance while scaling to large datasets.

artificial intelligence, data quality, machine learning, (16 more...)

Neural Information Processing Systems

Jun-18-2026, 22:22:58 GMT

Conferences PDF

Add feedback

Country:
- Europe (0.67)
- North America > United States
  - California (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology (0.46)
- Law (0.46)
- Government (0.46)

Technology:
- Information Technology
  - Data Science > Data Quality
    - Data Cleaning (0.48)
  - Artificial Intelligence
    - Representation & Reasoning > Model-Based Reasoning (0.54)
    - Machine Learning
      - Neural Networks (0.68)
      - Inductive Learning (0.54)
      - Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found