Semi-parametric Bayesian Additive Regression Trees

Prado, Estevão B., Parnell, Andrew C., McJames, Nathan, O'Shea, Ann, Moral, Rafael A.

arXiv.org Machine Learning 

Generalised Linear Models (GLMs McCullagh & Nelder 1989; Nelder & Wedderburn 1972) are frequently used in different applications to predict a univariate response due to the ease of interpretation of the parameter estimates as well as the large availability of software that facilitates simple analyses. A common assumption in GLMs is that the covariates specified (including potential interaction terms) have a linear relationship with the mean of the response after transformation through the link function. Extensions such as Generalised Additive Models (GAMs T. J. Hastie & Tibshirani 1990; Wood 2017) require the specification of the main and interaction effects via a sum of (potentially non-linear) predictors. In GAMs, the non-linear relationship is usually captured via basis expansions of the covariates and constrained by a smoothing parameter. However, in problems where the numbers of covariates and/or observations are large, the linearity assumption may not be verified and, more importantly, it may not be simple to specify the covariates and their interactions that impact most on the response.