feature rank
Explainable but Vulnerable: Adversarial Attacks on XAI Explanation in Cybersecurity Applications
Mia, Maraz, Pritom, Mir Mehedi A.
Explainable Artificial Intelligence (XAI) has aided machine learning (ML) researchers with the power of scrutinizing the decisions of the black-box models. XAI methods enable looking deep inside the models' behavior, eventually generating explanations along with a perceived trust and transparency. However, depending on any specific XAI method, the level of trust can vary. It is evident that XAI methods can themselves be a victim of post-adversarial attacks that manipulate the expected outcome from the explanation module. Among such attack tactics, fairwashing explanation (FE), manipulation explanation (ME), and backdoor-enabled manipulation attacks (BD) are the notable ones. In this paper, we try to understand these adversarial attack techniques, tactics, and procedures (TTPs) on explanation alteration and thus the effect on the model's decisions. We have explored a total of six different individual attack procedures on post-hoc explanation methods such as SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanation), and IG (Integrated Gradients), and investigated those adversarial attacks in cybersecurity applications scenarios such as phishing, malware, intrusion, and fraudulent website detection. Our experimental study reveals the actual effectiveness of these attacks, thus providing an urgency for immediate attention to enhance the resiliency of XAI methods and their applications.
Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data
Lee, Youngro, Baruzzo, Giacomo, Kim, Jeonghwan, Seo, Jongmo, Di Camillo, Barbara
In tabular biomedical data analysis, tuning models to high accuracy is considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with performance. In this work, we challenge the prevailing belief, showing that low-performing models may also be used for feature importance. We propose experiments to observe changes in feature rank as performance degrades sequentially. Using three synthetic datasets and six real biomedical datasets, we compare the rank of features from full datasets to those with reduced sample sizes (data cutting) or fewer features (feature cutting). In synthetic datasets, feature cutting does not change feature rank, while data cutting shows higher discrepancies with lower performance. In real datasets, feature cutting shows similar or smaller changes than data cutting, though some datasets exhibit the opposite. When feature interactions are controlled by removing correlations, feature cutting consistently shows better stability. By analyzing the distribution of feature importance values and theoretically examining the probability that the model cannot distinguish feature importance between features, we reveal that models can still distinguish feature importance despite performance degradation through feature cutting, but not through data cutting. We conclude that the validity of feature importance can be maintained even at low performance levels if the data size is adequate, which is a significant factor contributing to suboptimal performance in tabular medical data analysis. This paper demonstrates the potential for utilizing feature importance analysis alongside statistical analysis to compare features relatively, even when classifier performance is not satisfactory.
Deep Grokking: Would Deep Neural Networks Generalize Better?
Fan, Simin, Pascanu, Razvan, Jaggi, Martin
Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set. While the existing research primarily focus on shallow networks such as 2-layer MLP and 1-layer Transformer, we explore grokking on deep networks (e.g. 12-layer MLP). We empirically replicate the phenomenon and find that deep neural networks can be more susceptible to grokking than its shallower counterparts. Meanwhile, we observe an intriguing multi-stage generalization phenomenon when increase the depth of the MLP model where the test accuracy exhibits a secondary surge, which is scarcely seen on shallow models. We further uncover compelling correspondences between the decreasing of feature ranks and the phase transition from overfitting to the generalization stage during grokking. Additionally, we find that the multi-stage generalization phenomenon often aligns with a double-descent pattern in feature ranks. These observations suggest that internal feature rank could serve as a more promising indicator of the model's generalization behavior compared to the weight-norm. We believe our work is the first one to dive into grokking in deep neural networks, and investigate the relationship of feature rank and generalization performance.
Sharpness-Aware Minimization Leads to Low-Rank Features
Andriushchenko, Maksym, Bahri, Dara, Mobahi, Hossein, Flammarion, Nicolas
Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the sharpness of the training loss of a neural network. While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. To better understand this phenomenon, we provide a mechanistic understanding of how low-rank features arise in a simple two-layer network. We observe that a significant number of activations gets entirely pruned by SAM which directly contributes to the rank reduction. We confirm this effect theoretically and check that it can also occur in deep networks, although the overall rank reduction mechanism can be more complex, especially for deep networks with pre-activation skip connections and self-attention layers. We make our code available at https://github.com/tml-epfl/sam-low-rank-features.
MDI+: A Flexible Random Forest-Based Feature Importance Framework
Agarwal, Abhineet, Kenney, Ana M., Tan, Yan Shuo, Tang, Tiffany M., Yu, Bin
Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.
On the Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning
Lee, Hojoon, Lee, Koanho, Hwang, Dongyoon, Lee, Hyunho, Lee, Byungkun, Choo, Jaegul
Recently, unsupervised representation learning (URL) has improved the sample efficiency of Reinforcement Learning (RL) by pretraining a model from a large unlabeled dataset. The underlying principle of these methods is to learn temporally predictive representations by predicting future states in the latent space. However, an important challenge of this approach is the representational collapse, where the subspace of the latent representations collapses into a low-dimensional manifold. To address this issue, we propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold by decorrelating the features in the latent space. Through extensive empirical studies, we demonstrate that our framework effectively learns predictive representations without collapse, which significantly improves the sample efficiency of state-of-the-art URL methods on the Atari 100k benchmark. The code is available at https://github.com/dojeon-ai/SimTPR.
Interpretable Machine Learning for Privacy-Preserving Pervasive Systems
Baron, Benjamin, Musolesi, Mirco
With the emergence of connected devices (e.g., smartphones and smartmeters), pervasive systems generate growing amounts of digital traces as users undergo their everyday activities. These traces are crucial to service providers to understand their customers, to increase the degree of personalization, and enhance the quality of their services. For instance, personal digital traces stemming from public transit smartcards help transportation providers understand the commuting patterns of users; the usage statistics of home appliances can be used to improve energy efficiency; on-street cameras provide police officers with new ways of investigating crimes; content generated through mobile and wearables (such as posts in online social media or GPS running routes in specialized websites such as those for fitness) can be used to provide tailored content to individuals; bank transaction logs can be used to spot unusual activity in accounts. However, sharing these digital traces generated by pervasive systems with service providers might raise concerns with regards to privacy. Indeed, the processing and analysis of these digital traces can surface latent information about the behavior of the users. While service providers have to store the usergenerated data in large databases that guarantee a certain level of privacy (e.g., from storing the traces in an anonymized manner using randomly-generated identifiers instead of the real user's name and surname to using more sophisticated privacy-preserving techniques such as differential privacy), third parties such as advertisers that have access to the traces can leverage machine learning techniques to reveal personal information about the users and expose their privacy [1]. This includes inferring personal information about users and identifying a single individual from a collection of user-generated traces. Moreover, these traces might reveal information about the significant places routinely visited by the user, enabling the service provider to infer a wide range of personal information, including the user's place of residence and work and their future locations. To a further extent, presence traces can also be used to identify a specific individual in a population.