feature correlation
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- Europe (0.14)
- (5 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- (16 more...)
- Overview (0.46)
- Research Report (0.46)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- (15 more...)
- Overview (0.47)
- Research Report (0.46)
- North America > United States > Illinois (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- North America > United States > California (0.04)
- North America > Nicaragua (0.04)
- North America > Montserrat (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Information Technology (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Oceania > New Zealand > North Island > Waikato (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Data Science > Data Mining (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Distance-Based Network Recovery under Feature Correlation
We present an inference method for Gaussian graphical models when only pairwise distances of n objects are observed. Formally, this is a problem of estimating an n x n covariance matrix from the Mahalanobis distances dMH(xi, xj), where object xi lives in a latent feature space. We solve the problem in fully Bayesian fashion by integrating over the Matrix-Normal likelihood and a Matrix-Gamma prior; the resulting Matrix-T posterior enables network recovery even under strongly correlated features. Hereby, we generalize TiWnet, which assumes Euclidean distances with strict feature independence. In spite of the greatly increased flexibility, our model neither loses statistical power nor entails more computational cost. We argue that the extension is highly relevant as it yields significantly better results in both synthetic and real-world experiments, which is successfully demonstrated for a network of biological pathways in cancer patients.
One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing
Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based methods are a standard tool for this task, classical implementations rely on repeated random permutations, introducing computational overhead and stochastic instability. In this paper, we show that by replacing multiple random permutations with a single, deterministic, and optimal permutation, we achieve a method that retains the core principles of permutation-based importance while being non-random, faster, and more stable. We validate this approach across nearly 200 scenarios, including real-world household finance and credit risk applications, demonstrating improved bias-variance tradeoffs and accuracy in challenging regimes such as small sample sizes, high dimensionality, and low signal-to-noise ratios. Finally, we introduce Systemic Variable Importance, a natural extension designed for model stress-testing that explicitly accounts for feature correlations. This framework provides a transparent way to quantify how shocks or perturbations propagate through correlated inputs, revealing dependencies that standard variable importance measures miss. Two real-world case studies demonstrate how this metric can be used to audit models for hidden reliance on protected attributes (e.g., gender or race), enabling regulators and practitioners to assess fairness and systemic risk in a principled and computationally efficient manner.
- North America > United States (0.14)
- Europe > Spain (0.04)
- South America (0.04)
- (5 more...)
- Banking & Finance > Credit (1.00)
- Government (0.87)
- North America > United States > Illinois (0.04)
- North America > Canada > Quebec > Montreal (0.04)
An empirical study of task and feature correlations in the reuse of pre-trained models
Mohamud, Jama Hussein, Brink, Willie
Pre-trained neural networks are commonly used and reused in the machine learning community. Alice trains a model for a particular task, and a part of her neural network is reused by Bob for a different task, often to great effect. To what can we ascribe Bob's success? This paper introduces an experimental setup through which factors contributing to Bob's empirical success could be studied in silico. As a result, we demonstrate that Bob might just be lucky: his task accuracy increases monotonically with the correlation between his task and Alice's. Even when Bob has provably uncorrelated tasks and input features from Alice's pre-trained network, he can achieve significantly better than random performance due to Alice's choice of network and optimizer. When there is little correlation between tasks, only reusing lower pre-trained layers is preferable, and we hypothesize the converse: that the optimal number of retrained layers is indicative of task and feature correlation. Finally, we show in controlled real-world scenarios that Bob can effectively reuse Alice's pre-trained network if there are semantic correlations between his and Alice's task.