volatile acidity
MinShap: A Modified Shapley Value Approach for Feature Selection
Zheng, Chenghui, Raskutti, Garvesh
Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in the context of unknown non-linear relationships with dependent features. On the other hand, Shapley values are a classic solution concept from cooperative game theory that is widely used for feature attribution in general non-linear models with highly-dependent features. However, Shapley values are not naturally suited for feature selection since they tend to capture both direct effects from each feature to the response and indirect effects through other features. In this paper, we combine the advantages of Shapley values and adapt them to feature selection by proposing \emph{MinShap}, a modification of the Shapley value framework along with a suite of other related algorithms. In particular for MinShap, instead of taking the average marginal contributions over permutations of features, considers the minimum marginal contribution across permutations. We provide a theoretical foundation motivated by the faithfulness assumption in DAG (directed acyclic graphical models), a guarantee for the Type I error of MinShap, and show through numerical simulations and real data experiments that MinShap tends to outperform state-of-the-art feature selection algorithms such as LOCO, GCM and Lasso in terms of both accuracy and stability. We also introduce a suite of algorithms related to MinShap by using the multiple testing/p-value perspective that improves performance in lower-sample settings and provide supporting theoretical guarantees.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > California (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Opening the Black Box: Visualising Machine Learning Algorithms
These days machine learning is all the hype. Unfortunately, these algorithms are usually considered rather hard to interpret, leaving business stakeholders feeling queasy. I've seen analytics teams use these powerful tools to build exceptionally good models only to have them thrown in the scrap heap. People just didn't get them. And if they don't get them, they don't trust them.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.48)
- Transportation > Air (0.41)
Keras Tutorial: Deep Learning in Python
However, just like a biological neuron only fires when a certain treshold is exceeded, the artificial neuron will also only fire when the sum of the inputs exceeds a treshold, let's say for example 0. For this tutorial, you'll use the wine quality data set that you can find in the wine quality data set from the UCI Machine Learning Repository. You might already know this data set, as it's one of the most popular data sets to get started on learning how to work out machine learning problems. One of the first things that you'll probably want to do is to start off with getting a quick view on both of your DataFrames: Now is the time to check whether your import was successful: double check whether the data contains all the variables that the data description file of the UCI Machine Learning Repository promised you.