Most Influential Subset Selection: Challenges, Promises, and Beyond Han Zhao
–Neural Information Processing Systems
How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective influence of a set of samples. To tackle this challenge, we study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence. We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses. Our findings reveal that influence-based greedy heuristics, a dominant class of algorithms in MISS, can provably fail even in linear regression.
Neural Information Processing Systems
Mar-27-2025, 10:38:28 GMT
- Country:
- Europe (0.67)
- North America > United States
- Illinois (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Banking & Finance (0.67)
- Government (1.00)
- Technology: