Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?

Schmid, Lena, Gerharz, Alexander, Groll, Andreas, Pauly, Markus

arXiv.org Machine Learning 

The hope of such multivariate analyses is, that the consideration of possible dependencies between the outcomes may lead to procedures with better power (in case of inference) or accuracy (in case of prediction) compared to separate univariate analyses. While the need for the development and use of valid and distributional robust or nonparametric multivariate methods has been recognized and addressed in inferential statistic (Dobler et al., 2020; Friedrich et al., 2019; Konietschke et al., 2015; Smaga, 2017; Vallejo and Ato, 2012; Zimmermann et al., 2020), there do not exist exhausting studies that exploit the potential of multivariate regression methods for prediction. Focussing on tree-based ensemble methods as the Random Forest, it is the aim of this manuscript to close this gap. In particular, we want to answer our research-motivating question: When should a holistic multivariate regression approach be preferred over separate univariate predictions? Corresponding Author Email address: lena.schmid@tu-dortmund.de (Lena Schmid)