imbalance
The Mass Agreement Score: A Point-centric Measure of Cluster Size Consistency
In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability: partitions that differ only slightly in their point assignments should receive similar uniformity scores. A difficulty arises because cluster labels are not fixed objects; algorithms may produce different numbers of labels even when the underlying point distribution changes very little. Measures defined directly over labels can therefore become unstable under label-count perturbations. I introduce the Mass Agreement Score (MAS), a point-centric metric bounded in [0, 1] that evaluates the consistency of expected cluster size as measured from the perspective of points in each cluster. Its construction yields fragment robustness by design, assigning similar scores to partitions with similar bulk structure while remaining sensitive to genuine redistribution of cluster mass.
- North America > United States > New York (0.04)
- Europe > United Kingdom (0.04)
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification
Our findings show interesting trends, particularly pertaining to recent methods for data curation such as synthetic data generation and lookup based on CLIP embeddings. We show that although these strategies are highly competitive for certain tasks, the curation strategy used to assemble the original ImageNet-1K dataset remains the gold standard. We anticipate that our benchmark can illuminate the path for new methods to further reduce the gap.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.94)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Oman (0.04)
- North America > United States > Texas (0.04)
- (4 more...)
- Transportation > Ground > Road (1.00)
- Transportation > Passenger (0.67)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > Western Australia > Perth (0.04)
- North America > United States > Texas > Dallas County > Richardson (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- North America > United States > California > Monterey County > Monterey (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Education (0.68)
- Health & Medicine (0.46)