data generating mechanism
- South America > Peru (0.04)
- South America > Colombia (0.04)
- North America > Mexico (0.04)
- (4 more...)
- Health & Medicine > Consumer Health (1.00)
- Education (0.93)
- Government (0.92)
- Health & Medicine > Therapeutic Area (0.68)
- South America > Peru (0.04)
- South America > Colombia (0.04)
- North America > Mexico (0.04)
- (4 more...)
- Health & Medicine > Consumer Health (1.00)
- Education (0.92)
- Government (0.92)
- Health & Medicine > Therapeutic Area (0.68)
The Sensitivity of Variational Bayesian Neural Network Performance to Hyperparameters
Koermer, Scott, Klein, Natalie
In scientific applications, predictive modeling is often of limited use without accurate uncertainty quantification (UQ) to indicate when a model may be extrapolating or when more data needs to be collected. Bayesian Neural Networks (BNNs) produce predictive uncertainty by propagating uncertainty in neural network (NN) weights and offer the promise of obtaining not only an accurate predictive model but also accurate UQ. However, in practice, obtaining accurate UQ with BNNs is difficult due in part to the approximations used for practical model training and in part to the need to choose a suitable set of hyperparameters; these hyperparameters outnumber those needed for traditional NNs and often have opaque effects on the results. We aim to shed light on the effects of hyperparameter choices for BNNs by performing a global sensitivity analysis of BNN performance under varying hyperparameter settings. Our results indicate that many of the hyperparameters interact with each other to affect both predictive accuracy and UQ. For improved usage of BNNs in real-world applications, we suggest that global sensitivity analysis, or related methods such as Bayesian optimization, should be used to aid in dimensionality reduction and selection of hyperparameters to ensure accurate UQ in BNNs.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England (0.04)
Review for NeurIPS paper: Generalised Bayesian Filtering via Sequential Monte Carlo
Weaknesses: - The authors choose to select \beta based on predictive accuracy. This is sensible, but what other approaches could also be used? And does it make sense to consider predictive accuracy on a separate training dataset? In the SMC community, people usually care more about the efficiency with which you can calculate the likelihood function (in order to estimate the parameters with particle MCMC), the accuracy of the filtered distribution, or ESS. Predictive accuracy is usually not a primary accuracy criterion so does it make sense to select \beta with this metric? From the simulation study, it appears that using predictive accuracy works well, but also seems to be consistently sub-optimal.
Manifold Restricted Interventional Shapley Values
Taufiq, Muhammad Faaiz, Blöbaum, Patrick, Minorics, Lenon
Shapley values are model-agnostic methods for explaining model predictions. Many commonly used methods of computing Shapley values, known as off-manifold methods, rely on model evaluations on out-of-distribution input samples. Consequently, explanations obtained are sensitive to model behaviour outside the data distribution, which may be irrelevant for all practical purposes. While on-manifold methods have been proposed which do not suffer from this problem, we show that such methods are overly dependent on the input data distribution, and therefore result in unintuitive and misleading explanations. To circumvent these problems, we propose ManifoldShap, which respects the model's domain of validity by restricting model evaluations to the data manifold. We show, theoretically and empirically, that ManifoldShap is robust to off-manifold perturbations of the model and leads to more accurate and intuitive explanations than existing state-of-the-art Shapley methods.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > Florida > Broward County (0.04)
- (3 more...)
Causal discovery for observational sciences using supervised machine learning
Petersen, Anne Helby, Ramsey, Joseph, Ekstrøm, Claus Thorn, Spirtes, Peter
Causal inference can estimate causal effects, but unless data are collected experimentally, statistical analyses must rely on pre-specified causal models. Causal discovery algorithms are empirical methods for constructing such causal models from data. Several asymptotically correct methods already exist, but they generally struggle on smaller samples. Moreover, most methods focus on very sparse causal models, which may not always be a realistic representation of real-life data generating mechanisms. Finally, while causal relationships suggested by the methods often hold true, their claims about causal non-relatedness have high error rates. This non-conservative error tradeoff is not ideal for observational sciences, where the resulting model is directly used to inform causal inference: A causal model with many missing causal relations entails too strong assumptions and may lead to biased effect estimates. We propose a new causal discovery method that addresses these three shortcomings: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study based on Gaussian data and we consider several choices of model size and sample size. We find that SLdisco is more conservative, only moderately less informative and less sensitive towards sample size than existing procedures. We furthermore provide a real epidemiological data application. We use random subsampling to investigate real data performance on small samples and again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Greenland (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
How to Improve Deep Learning Forecasts for Time Series -- Part 2
In the prior post we explained how clustering of time series data works. In this post we're going to do a deep dive into the code itself. Everything will be written in python, but most libraries have an R version. We will try to stay relatively high level but the code will have some useful resources if you're looking for more. Without further ado, let's dive in.