Striaukas, Jonas
High-dimensional censored MIDAS logistic regression for corporate survival forecasting
Miao, Wei, Beyhum, Jad, Striaukas, Jonas, Van Keilegom, Ingrid
This paper addresses the challenge of forecasting corporate distress, a problem marked by three key statistical hurdles: (i) right censoring, (ii) high-dimensional predictors, and (iii) mixed-frequency data. To overcome these complexities, we introduce a novel high-dimensional censored MIDAS (Mixed Data Sampling) logistic regression. Our approach handles censoring through inverse probability weighting and achieves accurate estimation with numerous mixed-frequency predictors by employing a sparse-group penalty. We establish finite-sample bounds for the estimation error, accounting for censoring, the MIDAS approximation error, and heavy tails. The superior performance of the method is demonstrated through Monte Carlo simulations. Finally, we present an extensive application of our methodology to predict the financial distress of Chinese-listed firms. Our novel procedure is implemented in the R package 'Survivalml'.
Panel Data Nowcasting: The Case of Price-Earnings Ratios
Babii, Andrii, Ball, Ryan T., Ghysels, Eric, Striaukas, Jonas
The paper uses structured machine learning regressions for nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization which can take advantage of the mixed frequency time series panel data structures. Our empirical results show the superior performance of our machine learning panel data regression models over analysts' predictions, forecast combinations, firm-specific time series regression models, and standard machine learning methods.
Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios
Babii, Andrii, Ball, Ryan T., Ghysels, Eric, Striaukas, Jonas
This paper introduces structured machine learning regressions for prediction and nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed $\tau$-mixing processes which may be of independent interest in other high-dimensional panel data settings.
Machine learning time series regressions with an application to nowcasting
Babii, Andrii, Ghysels, Eric, Striaukas, Jonas
The statistical imprecision of quarterly gross domestic product (GDP) estimates, along with the fact that the first estimate is available with a delay of nearly a month, pose a significant challenge to policy makers, market participants, and other observers with an interest in monitoring the state of the economy in real time; see, e.g., Ghysels, Horan, and Moench (2018) for a recent discussion of macroeconomic data revision and publication delays. A term originated in meteorology, nowcasting pertains to the prediction of the present and very near future. Nowcasting is intrinsically a mixed frequency data problem as the object of interest is a low-frequency data series (e.g., quarterly GDP), whereas the real-time information (e.g., daily, weekly, or monthly) can be used to update the state, or to put it differently, to nowcast the low-frequency series of interest. Traditional methods used for nowcasting rely on dynamic factor models that treat the underlying low frequency series of interest as a latent process with high frequency data noisy observations. These models are naturally cast in a state-space form and inference can be performed using likelihood-based methods and Kalman filtering techniques; see Bańbura, Giannone, Modugno, and Reichlin (2013) for a recent survey.