banzhaf value
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
Robust Data Valuation with Weighted Banzhaf Values
Data valuation, a principled way to rank the importance of each training datum, has become increasingly important. However, existing value-based approaches (e.g., Shapley) are known to suffer from the stochasticity inherent in utility functions that render consistent and reliable ranking difficult. Recently, Wang and Jia (2023) proposed the noise-structure-agnostic framework to advocate the Banzhaf value for its robustness against such stochasticity as it achieves the largest safe margin among many alternatives. Surprisingly, our empirical study shows that the Banzhaf value is not always the most robust when compared with a broader family: weighted Banzhaf values. To analyze this scenario, we introduce the concept of Kronecker noise to parameterize stochasticity, through which we prove that the uniquely robust semi-value, which can be analytically derived from the underlying Kronecker noise, lies in the family of weighted Banzhaf values while minimizing the worst-case entropy. In addition, we adopt the maximum sample reuse principle to design an estimator to efficiently approximate weighted Banzhaf values, and show that it enjoys the best time complexity in terms of achieving an $(\epsilon, \delta)$-approximation. Our theory is verified under both synthetic and authentic noises. For the latter, we fit a Kronecker noise to the inherent stochasticity, which is then plugged in to generate the predicted most robust semi-value. Our study suggests that weighted Banzhaf values are promising when facing undue noises in data valuation.
From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm
Nadel, Alexander, Wettenstein, Ron
SHapley Additive exPlanations (SHAP) is a key tool for interpreting decision tree ensembles by assigning contribution values to features. It is widely used in finance, advertising, medicine, and other domains. Two main approaches to SHAP calculation exist: Path-Dependent SHAP, which leverages the tree structure for efficiency, and Background SHAP, which uses a background dataset to estimate feature distributions. We introduce WOODELF, a SHAP algorithm that integrates decision trees, game theory, and Boolean logic into a unified framework. For each consumer, WOODELF constructs a pseudo-Boolean formula that captures their feature values, the structure of the decision tree ensemble, and the entire background dataset. It then leverages this representation to compute Background SHAP in linear time. WOODELF can also compute Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values. WOODELF is designed to run efficiently on CPU and GPU hardware alike. Available via the WOODELF Python package, it is implemented using NumPy, SciPy, and CuPy without relying on custom C++ or CUDA code. This design enables fast performance and seamless integration into existing frameworks, supporting large-scale computation of SHAP and other game-theoretic values in practice. For example, on a dataset with 3,000,000 rows, 5,000,000 background samples, and 127 features, WOODELF computed all Background Shapley values in 162 seconds on CPU and 16 seconds on GPU - compared to 44 minutes required by the best method on any hardware platform, representing 16x and 165x speedups, respectively.
- Asia > Singapore (0.04)
- North America > United States (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Information Technology > Security & Privacy (0.93)
- Leisure & Entertainment > Games (0.68)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
When is the Computation of a Feature Attribution Method Tractable?
Barceló, P., Cominetti, R., Morgado, M.
Feature attribution methods have become essential for explaining machine learning models. Many popular approaches, such as SHAP and Banzhaf values, are grounded in power indices from cooperative game theory, which measure the contribution of features to model predictions. This work studies the computational complexity of power indices beyond SHAP, addressing the conditions under which they can be computed efficiently. We identify a simple condition on power indices that ensures that computation is polynomially equivalent to evaluating expected values, extending known results for SHAP. We also introduce Bernoulli power indices, showing that their computation can be simplified to a constant number of expected value evaluations. Furthermore, we explore interaction power indices that quantify the importance of feature subsets, proving that their computation complexity mirrors that of individual features.
BANGS: Game-Theoretic Node Selection for Graph Self-Training
Wang, Fangxin, Liu, Kay, Medya, Sourav, Yu, Philip S.
Graph self-training is a semi-supervised learning method that iteratively selects a set of unlabeled data to retrain the underlying graph neural network (GNN) model and improve its prediction performance. While selecting highly confident nodes has proven effective for self-training, this pseudo-labeling strategy ignores the combinatorial dependencies between nodes and suffers from a local view of the distribution. To overcome these issues, we propose BANGS, a novel framework that unifies the labeling strategy with conditional mutual information as the objective of node selection. Our approach -- grounded in game theory -- selects nodes in a combinatorial fashion and provides theoretical guarantees for robustness under noisy objective. More specifically, unlike traditional methods that rank and select nodes independently, BANGS considers nodes as a collective set in the self-training process. Our method demonstrates superior performance and robustness across various datasets, base models, and hyperparameter settings, outperforming existing techniques. The codebase is available on https://github.com/fangxin-wang/BANGS .
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Kernel Banzhaf: A Fast and Robust Estimator for Banzhaf Values
Liu, Yurong, Witter, R. Teal, Korn, Flip, Alrashed, Tarfah, Paparas, Dimitris, Freire, Juliana
Banzhaf values offer a simple and interpretable alternative to the widely-used Shapley values. We introduce Kernel Banzhaf, a novel algorithm inspired by KernelSHAP, that leverages an elegant connection between Banzhaf values and linear regression. Through extensive experiments on feature attribution tasks, we demonstrate that Kernel Banzhaf substantially outperforms other algorithms for estimating Banzhaf values in both sample efficiency and robustness to noise. Furthermore, we prove theoretical guarantees on the algorithm's performance, establishing Kernel Banzhaf as a valuable tool for interpretable machine learning. The increasing complexity of AI models has intensified the challenges associated with model interpretability. Modern machine learning models, such as deep neural networks and complex ensemble methods, often operate as "opaque boxes." This opacity makes it difficult for users to understand and trust model predictions, especially in decision-making scenarios like healthcare, finance, and legal applications, which require rigorous justifications. Thus, there is a pressing need for reliable explainability tools to bridge the gap between complex model behaviors and human understanding. Among the various methods employed within explainable AI, game-theoretic approaches have gained prominence for quantifying the contribution of features in predictive modeling and enhancing model interpretability. While primarily associated with feature attribution (Lundberg & Lee, 2017; Karczmarz et al., 2022), these methods also contribute to broader machine learning tasks such as feature selection (Covert et al., 2020) and data valuation (Ghorbani & Zou, 2019; Wang & Jia, 2023). Such applications extend the utility of explainable AI, fostering greater trust in AI systems by providing insights beyond traditional explanations.
- North America > United States > New York (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine (0.89)
- Information Technology > Security & Privacy (0.34)
On the influence of dependent features in classification problems: a game-theoretic perspective
Davila-Pena, Laura, Saavedra-Nieves, Alejandro, Casas-Méndez, Balbina
Within this framework, we consider a sample of individuals characterized by specific features, each feature encompassing a finite range of values, and classified based on a binary response variable. This measure turns out to be an influence measure explored in existing literature and related to cooperative game theory. We provide an axiomatic characterization of our proposed influence measure by tailoring properties from the cooperative game theory to our specific context. Furthermore, we demonstrate that our influence measure becomes a general characterization of the well-known Banzhaf-Owen value for games with a priori unions, from the perspective of classification problems. The definitions and results presented herein are illustrated through numerical examples and various applications, offering practical insights into our methodologies.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.04)
- North America > United States (0.04)
- (2 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area (1.00)
Game-theoretic Counterfactual Explanation for Graph Neural Networks
Chhablani, Chirag, Jain, Sarthak, Channesh, Akshay, Kash, Ian A., Medya, Sourav
Graph Neural Networks (GNNs) have been a powerful tool for node classification tasks in complex networks. However, their decision-making processes remain a black-box to users, making it challenging to understand the reasoning behind their predictions. Counterfactual explanations (CFE) have shown promise in enhancing the interpretability of machine learning models. Prior approaches to compute CFE for GNNS often are learning-based approaches that require training additional graphs. In this paper, we propose a semivalue-based, non-learning approach to generate CFE for node classification tasks, eliminating the need for any additional training. Our results reveals that computing Banzhaf values requires lower sample complexity in identifying the counterfactual explanations compared to other popular methods such as computing Shapley values. Our empirical evidence indicates computing Banzhaf values can achieve up to a fourfold speed up compared to Shapley values. We also design a thresholding method for computing Banzhaf values and show theoretical and empirical results on its robustness in noisy environments, making it superior to Shapley values. Furthermore, the thresholded Banzhaf values are shown to enhance efficiency without compromising the quality (i.e., fidelity) in the explanations in three popular graph datasets.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (3 more...)
Data Banzhaf: A Robust Data Valuation Framework for Machine Learning
Data valuation has wide use cases in machine learning, including improving data quality and creating economic incentives for data sharing. This paper studies the robustness of data valuation to noisy model performance scores. Particularly, we find that the inherent randomness of the widely used stochastic gradient descent can cause existing data value notions (e.g., the Shapley value and the Leave-one-out error) to produce inconsistent data value rankings across different runs. To address this challenge, we introduce the concept of safety margin, which measures the robustness of a data value notion. We show that the Banzhaf value, a famous value notion that originated from cooperative game theory literature, achieves the largest safety margin among all semivalues (a class of value notions that satisfy crucial properties entailed by ML applications and include the famous Shapley value and Leave-one-out error). We propose an algorithm to efficiently estimate the Banzhaf value based on the Maximum Sample Reuse (MSR) principle. Our evaluation demonstrates that the Banzhaf value outperforms the existing semivalue-based data value notions on several ML tasks such as learning with weighted samples and noisy label detection. Overall, our study suggests that when the underlying ML algorithm is stochastic, the Banzhaf value is a promising alternative to the other semivalue-based data value schemes given its computational advantage and ability to robustly differentiate data quality.
- Europe > France (0.04)
- North America > United States > Virginia (0.04)
- Research Report > Experimental Study (0.66)
- Research Report > New Finding (0.46)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)