linear term
Hypothesis-free discovery from epidemiological data by automatic detection and local inference for tree-based nonlinearities and interactions
Spadaccini, Giorgio, Fokkema, Marjolein, van de Wiel, Mark A.
In epidemiological settings, Machine Learning (ML) is gaining popularity for hypothesis-free discovery of risk (or protective) factors. Although ML is strong at discovering non-linearities and interactions, this power is currently compromised by a lack of reliable inference. Although local measures of feature effect can be combined with tree ensembles, uncertainty quantifications for these measures remain only partially available and oftentimes unsatisfactory. We propose RuleSHAP, a framework for using rule-based, hypothesis-free discovery that combines sparse Bayesian regression, tree ensembles and Shapley values in a one-step procedure that both detects and tests complex patterns at the individual level. To ease computation, we derive a formula that computes marginal Shapley values more efficiently for our setting. We demonstrate the validity of our framework on simulated data. To illustrate, we apply our machinery to data from an epidemiological cohort to detect and infer several effects for high cholesterol and blood pressure, such as nonlinear interaction effects between features like age, sex, ethnicity, BMI and glucose level.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Epidemiology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.48)
Causal rule ensemble approach for multi-arm data
Wan, Ke, Tanioka, Kensuke, Shimokawa, Toshio
Heterogeneous treatment effect (HTE) estimation is critical in medical research. It provides insights into how treatment effects vary among individuals, which can provide statistical evidence for precision medicine. While most existing methods focus on binary treatment situations, real-world applications often involve multiple interventions. However, current HTE estimation methods are primarily designed for binary comparisons and often rely on black-box models, which limit their applicability and interpretability in multi-arm settings. To address these challenges, we propose an interpretable machine learning framework for HTE estimation in multi-arm trials. Our method employs a rule-based ensemble approach consisting of rule generation, rule ensemble, and HTE estimation, ensuring both predictive accuracy and interpretability. Through extensive simulation studies and real data applications, the performance of our method was evaluated against state-of-the-art multi-arm HTE estimation approaches. The results indicate that our approach achieved lower bias and higher estimation accuracy compared with those of existing methods. Furthermore, the interpretability of our framework allows clearer insights into how covariates influence treatment effects, facilitating clinical decision making. By bridging the gap between accuracy and interpretability, our study contributes a valuable tool for multi-arm HTE estimation, supporting precision medicine.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Research Report > Strength High (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Stabilized Neural Ordinary Differential Equations for Long-Time Forecasting of Dynamical Systems
Linot, Alec J., Burby, Joshua W., Tang, Qi, Balaprakash, Prasanna, Graham, Michael D., Maulik, Romit
In data-driven modeling of spatiotemporal phenomena careful consideration often needs to be made in capturing the dynamics of the high wavenumbers. This problem becomes especially challenging when the system of interest exhibits shocks or chaotic dynamics. We present a data-driven modeling method that accurately captures shocks and chaotic dynamics by proposing a novel architecture, stabilized neural ordinary differential equation (ODE). In our proposed architecture, we learn the right-hand-side (RHS) of an ODE by adding the outputs of two NN together where one learns a linear term and the other a nonlinear term. Specifically, we implement this by training a sparse linear convolutional NN to learn the linear term and a dense fully-connected nonlinear NN to learn the nonlinear term. This is in contrast with the standard neural ODE which involves training only a single NN for learning the RHS. We apply this setup to the viscous Burgers equation, which exhibits shocked behavior, and show better short-time tracking and prediction of the energy spectrum at high wavenumbers than a standard neural ODE. We also find that the stabilized neural ODE models are much more robust to noisy initial conditions than the standard neural ODE approach. We also apply this method to chaotic trajectories of the Kuramoto-Sivashinsky equation. In this case, stabilized neural ODEs keep long-time trajectories on the attractor, and are highly robust to noisy initial conditions, while standard neural ODEs fail at achieving either of these results. We conclude by demonstrating how stabilizing neural ODEs provide a natural extension for use in reduced-order modeling by projecting the dynamics onto the eigenvectors of the learned linear term.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (2 more...)
- Energy (0.93)
- Government > Regional Government (0.46)
Rethinking Exponential Averaging of the Fisher
In optimization for Machine learning (ML), it is typical that curvature-matrix (CM) estimates rely on an exponential average (EA) of local estimates (giving EA-CM algorithms). This approach has little principled justification, but is very often used in practice. In this paper, we draw a connection between EA-CM algorithms and what we call a "Wake of Quadratic regularized models". The outlined connection allows us to understand what EA-CM algorithms are doing from an optimization perspective. Generalizing from the established connection, we propose a new family of algorithms, "KL-Divergence Wake-Regularized Models" (KLD-WRM). We give three different practical instantiations of KLD-WRM, and show numerically that these outperform K-FAC on MNIST.
A Uniform Treatment of Aggregates and Constraints in Hybrid ASP
Cabalar, Pedro, Fandinno, Jorge, Schaub, Torsten, Wanko, Philipp
Characterizing hybrid ASP solving in a generic way is difficult since one needs to abstract from specific theories. Inspired by lazy SMT solving, this is usually addressed by treating theory atoms as opaque. Unlike this, we propose a slightly more transparent approach that includes an abstract notion of a term. Rather than imposing a syntax on terms, we keep them abstract by stipulating only some basic properties. With this, we further develop a semantic framework for hybrid ASP solving and provide aggregate functions for theory variables that adhere to different semantic principles, show that they generalize existing aggregate semantics in ASP and how we can rely on off-the-shelf hybrid solvers for implementation.
Learning Two layer Networks with Multinomial Activation and High Thresholds
Giving provable guarantees for learning neural networks is a core challenge of machine learning theory. Most prior work gives parameter recovery guarantees for one hidden layer networks. In this work we study a two layer network where the top node instead of a sum (one layer) is a well-behaved multivariate polynomial in all its inputs. We show that if the thresholds (biases) of the first layer neurons are higher than $\Omega(\sqrt{\log d})$ for $d$ being the input dimension, then the weights are learnable under the gaussian input. Furthermore even for lower thresholds, we can learn the lowest layer using polynomial sample complexity although exponential time. As an application of our results, we give a polynomial time algorithm for learning an intersection of halfspaces that are $\Omega(\sqrt{\log d})$ far from the origin for gaussian input distribution. Finally for deep networks with depth larger than two, assuming the layers two onwards can be expressed as a polynomial by simply using the taylor series, we can learn the lowest layer under the conditions required by our assumptions.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > China (0.04)
Tree Ensembles with Rule Structured Horseshoe Regularization
Nalenz, Malte, Villani, Mattias
We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
- (2 more...)
Sparse Quadratic Logistic Regression in Sub-quadratic Time
Shanmugam, Karthikeyan, Kocaoglu, Murat, Dimakis, Alexandros G., Sanghavi, Sujay
We consider support recovery in the quadratic logistic regression setting - where the target depends on both p linear terms $x_i$ and up to $p^2$ quadratic terms $x_i x_j$. Quadratic terms enable prediction/modeling of higher-order effects between features and the target, but when incorporated naively may involve solving a very large regression problem. We consider the sparse case, where at most $s$ terms (linear or quadratic) are non-zero, and provide a new faster algorithm. It involves (a) identifying the weak support (i.e. all relevant variables) and (b) standard logistic regression optimization only on these chosen variables. The first step relies on a novel insight about correlation tests in the presence of non-linearity, and takes $O(pn)$ time for $n$ samples - giving potentially huge computational gains over the naive approach. Motivated by insights from the boolean case, we propose a non-linear correlation test for non-binary finite support case that involves hashing a variable and then correlating with the output variable. We also provide experimental results to demonstrate the effectiveness of our methods.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
Social Relations Model for Collaborative Filtering
Li, Wu-Jun (Shanghai Jiao Tong University) | Yeung, Dit-Yan (Hong Kong University of Science and Technology)
We propose a novel probabilistic model for collaborative filtering (CF), called SRMCoFi, which seamlessly integrates both linear and bilinear random effects into a principled framework. The formulation of SRMCoFi is supported by both social psychological experiments and statistical theories. Not only can many existing CF methods be seen as special cases of SRMCoFi, but it also integrates their advantages while simultaneously overcoming their disadvantages. The solid theoretical foundation of SRMCoFi is further supported by promising empirical results obtained in extensive experiments using real CF data sets on movie ratings.
- Asia > China > Shanghai > Shanghai (0.05)
- Asia > China > Hong Kong (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)