Factor-augmented tree ensembles

Pellegrino, Filippo

arXiv.org Machine Learning 

This manuscript proposes to extend the information set of time-series regression trees with latent stationary factors extracted via state-space methods. First, it allows to handle predictors that exhibit measurement error, non-stationary trends, seasonality and/or irregularities such as missing observations. Second, it gives a transparent way for using domain-specific theory to inform time-series regression trees. As a byproduct, this technique sets the foundations for structuring powerful ensembles. Their real-world applicability is studied under the lenses of empirical macro-finance. Keywords: Ensemble learning, Factor models, State-space models, Time series, Unobserved components.Introduction In time series, the simplicity of regression trees (Morgan and Sonquist, 1963; Breiman et al., 1984; Quinlan, 1986) comes at a cost: irregularities, complicated periodic patterns and non-stationary trends cannot be explicitly modelled, and this is unfortunate given that many real-world examples are subject to them. Following, in spirit, Harvey et al. (1998), this paper proposes to pre-process problematic predictors using state-space representations general enough to deal with all these complexities at once. This operation can be thought as an automated feature engineering process that extracts stationary patterns hidden across multiple predictors, while handling problematic data characteristics. Besides, when the state-space representation is compatible with domain-specific theory, this becomes a transparent way for extracting signals with structural interpretation. The resulting stationary common components, referred hereinbelow as stationary dynamic factors, are then employed as regular predictors for standard time-series regression trees. This manuscript calls them factor-augmented regression trees to stress their dependence on latent components. I thank Matteo Barigozzi and Kostas Kalogeropoulos for their valuable suggestions and supervision; Serena Lariccia and Qiwei Yao for their helpful comments on a preliminary draft of this article.