cran
FoReco and FoRecoML: A Unified Toolbox for Forecast Reconciliation in R
Girolimetto, Daniele, Rombouts, Jeroen, Wilms, Ines, Yang, Yangzhuoran Fin
In this paper, we introduce the forecast reconciliation packages FoReco and FoRecoML for R (RCore Team 2026). Forecast reconciliation adjusts forecasts for linearly constrained multiple time series (such as hierarchical or grouped series, or series observed at different temporal frequencies) so that they are coherent with respect to the underlying constraints, improving both accuracy and consistency for informed decision making. The contributions of the packages are threefold. First, FoReco and FoRecoML are the first to offer functionality for forecast reconciliation methods across cross-sectional, temporal and cross-temporal frameworks. Second, the packages provide a comprehensive set of forecast reconciliation approaches, including classical (e.g., top-down, bottom-up and middle-out) and regression based reconciliation methods - in FoReco - as well as non-linear reconciliation methods using machine learning - in FoRecoML. A third key contribution is their unified design, which enables easy-to-use forecast reconciliation functions built on the same philosophy, regardless of the reconciliation framework or method.
mlr3torch: A Deep Learning Framework in R based on mlr3 and torch
Fischer, Sebastian, Burk, Lukas, Zhang, Carson, Bischl, Bernd, Binder, Martin
Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.
TorchSurv: A Lightweight Package for Deep Survival Analysis
Monod, Mรฉlodie, Krusche, Peter, Cao, Qian, Sahiner, Berkman, Petrick, Nicholas, Ohlssen, David, Coroller, Thibaud
TorchSurv is a Python package that serves as a companion tool to perform deep survival modeling within the PyTorch environment. Unlike existing libraries that impose specific parametric forms, TorchSurv enables the use of custom PyTorch-based deep survival models. With its lightweight design, minimal input requirements, full PyTorch backend, and freedom from restrictive survival model parameterizations, TorchSurv facilitates efficient deep survival model implementation and is particularly beneficial for high-dimensional and complex input data scenarios.
Ensemble learning for blending gridded satellite and gauge-measured precipitation data
Papacharalampous, Georgia, Tyralis, Hristos, Doulamis, Nikolaos, Doulamis, Anastasios
Regression algorithms are regularly used for improving the accuracy of satellite precipitation products. In this context, satellite precipitation and topography data are the predictor variables, and gauged-measured precipitation data are the dependent variables. Alongside this, it is increasingly recognised in many fields that combinations of algorithms through ensemble learning can lead to substantial predictive performance improvements. Still, a sufficient number of ensemble learners for improving the accuracy of satellite precipitation products and their large-scale comparison are currently missing from the literature. In this study, we work towards filling in this specific gap by proposing 11 new ensemble learners in the field and by extensively comparing them. We apply the ensemble learners to monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets that span over a 15-year period and over the entire the contiguous United States (CONUS). We also use gauge-measured precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The ensemble learners combine the predictions of six machine learning regression algorithms (base learners), namely the multivariate adaptive regression splines (MARS), multivariate adaptive polynomial splines (poly-MARS), random forests (RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and Bayesian regularized neural networks (BRNN), and each of them is based on a different combiner. The combiners include the equal-weight combiner, the median combiner, two best learners and seven variants of a sophisticated stacking method. The latter stacks a regression algorithm on top of the base learners to combine their independent predictions...
Exploring Local Explanations of Nonlinear Models Using Animated Linear Projections
Spyrison, Nicholas, Cook, Dianne, Biecek, Przemyslaw
The increased predictive power of machine learning models comes at the cost of increased complexity and loss of interpretability, particularly in comparison to parametric statistical models. This trade-off has led to the emergence of eXplainable AI (XAI) which provides methods, such as local explanations (LEs) and local variable attributions (LVAs), to shed light on how a model use predictors to arrive at a prediction. These provide a point estimate of the linear variable importance in the vicinity of a single observation. However, LVAs tend not to effectively handle association between predictors. To understand how the interaction between predictors affects the variable importance estimate, we can convert LVAs into linear projections and use the radial tour. This is also useful for learning how a model has made a mistake, or the effect of outliers, or the clustering of observations. The approach is illustrated with examples from categorical (penguin species, chocolate types) and quantitative (soccer/football salaries, house prices) response models. The methods are implemented in the R package cheem, available on CRAN.
Tidy Modeling with R
Welcome to Tidy Modeling with R! This book is a guide to using a collection of software in the R programming language for model building called tidymodels, and it has two main goals: First and foremost, this book provides a practical introduction to how to use these specific R packages to create models. We focus on a dialect of R called the tidyverse that is designed with a consistent, human-centered philosophy, and demonstrate how the tidyverse and the tidymodels packages can be used to produce high quality statistical and machine learning models. Second, this book will show you how to develop good methodology and statistical practices. Whenever possible, our software, documentation, and other materials attempt to prevent common pitfalls. In Chapter 1, we outline a taxonomy for models and highlight what good software for modeling is like.
Assessment of convolutional recurrent autoencoder network for learning wave propagation
Mallik, Wrik, Jaiman, Rajeev K., Jelovica, Jasmin
It is challenging to construct generalized physical models of wave propagation in nature owing to their complex physics as well as widely varying environmental parameters and dynamical scales. In this article, we present the convolutional autoencoder recurrent network (CRAN) as a data-driven model for learning wave propagation phenomena. The CRAN consists of a convolutional autoencoder for learning low-dimensional system representation and a long short-term memory recurrent neural network for the system evolution in low dimension. We show that the convolutional autoencoder significantly outperforms the dimension-reduction of complex wave propagation phenomena via projection-based methods as it can directly learn subspaces resembling wave characteristics. On the other hand, the projection-based modes are restricted to the Fourier subspace. Geometric priors of the convolutional autoencoder enabling selective scale separation of complex wave dynamics further enhance its dimension-reduction capability. We also demonstrate that geometric priors such as translation equivariance and translational invariance of the convolutional autoencoder enable generalized learning of low-dimensional maps. Thus, the composite CRAN model connecting the convolutional autoencoder with a long short-term memory network specially designed for autoregressive modeling can perform generalized wave propagation prediction over the desired time horizon. Numerical experiments display 90% mean structural similarity index measure of CRAN predictions compared to true solutions for out-of-training cases, and less than 10% pointwise $L_1$ error for most cases, verifying such generalization claims. Finally, the CRAN predictions offer similar wave characteristic patterns to the target solutions indicating not only their generalization but also their kinematical consistency.
SurvSet: An open-source time-to-event dataset repository
Time-to-event (T2E) analysis is a branch of statistics that models the duration of time it takes for an event to occur. Such events can include outcomes like death, unemployment, or product failure. Most modern machine learning (ML) algorithms, like decision trees and kernel methods, are supported for T2E modelling with data science software (python and R). To complement these developments, SurvSet is the first open-source T2E dataset repository designed for a rapid benchmarking of ML algorithms and statistical methods. The data in SurvSet have been consistently formatted so that a single preprocessing method will work for all datasets. SurvSet currently has 76 datasets which vary in dimensionality, time dependency, and background (the majority of which come from biomedicine). SurvSet is available on PyPI and can be installed with pip install SurvSet. R users can download the data directly from the corresponding git repository.
nanonext for Cross-language Data Exchange
Designed for performance and reliability, the NNG library is written in C and {nanonext} is a lightweight wrapper depending on no other packages. It provides a fast and reliable data interface between different programming languages where NNG has a binding, including C, C, Java, Python, Go, Rust etc. The following example demonstrates the exchange of numerical data between R and Python (NumPy), two of the most commonly-used languages for data science and machine learning. Using a messaging interface provides a clean and robust approach that is light on resources and offers limited and identifiable points of failure. This is especially relevant when processing real-time data, as an example.