Goto

Collaborating Authors

 lstm


TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning

Neural Information Processing Systems

In-context learning, the ability of large language models to perform tasks using only examples provided in the prompt, has recently been adapted for time series forecasting. This paradigm enables zero-shot prediction, where past values serve as context for forecasting future values, making powerful forecasting tools accessible to non-experts and increasing the performance when training data are scarce. Most existing zero-shot forecasting approaches rely on transformer architectures, which, despite their success in language, often fall short of expectations in time series forecasting, where recurrent models like LSTMs frequently have the edge. Conversely, while LSTMs are well-suited for time series modeling due to their state-tracking capabilities, they lack strong in-context learning abilities. We introduce that closes this gap by leveraging xLSTM, an enhanced LSTM with competitive in-context learning skills. Unlike transformers, state-space models, or parallelizable RNNs such as RWKV, TiRex retains state tracking, a critical property for long-horizon forecasting. To further facilitate its state-tracking ability, we propose a training-time masking strategy called CPM. TiRex sets a new state of the art in zero-shot time series forecasting on the Hugging Face benchmarks and, outperforming significantly larger models including (Prior Labs), (Amazon), (Google), and (Salesforce) across both short-and long-term forecasts.


Neural-Actuarial Longevity Forecasting: Anchoring LSTMs for Explainable Risk Management

arXiv.org Machine Learning

Traditional multi-population models, such as the Li-Lee framework, rely on the assumption of mean-reverting country-specific deviations. However, recent data from high-longevity clusters suggest a systemic break in this paradigm. We identify a stationarity paradox where mortality residuals in countries like Sweden and West Germany exhibit persistent unit roots, leading to a systematic mispricing of longevity risk in linear models. To address these non-linearities, we propose Hybrid-Lift, a neural-actuarial framework that combines Hierarchical LSTM networks with a Mean-Bias Correction (MBC) anchoring mechanism. Positioned as a governance-friendly model challenger rather than a replacement of classical approaches, the framework exhibits selective superiority on out-of-sample validation (2012-2020): it outperforms Li-Lee by 17.40% in Sweden and 12.57% in West Germany, while remaining comparable for near-linear regimes such as Switzerland and Japan. We complement the predictive model with an integrated governance suite comprising SHAP-based cross-country influence mapping, a dual uncertainty framework for regulatory capital calibration (Swiss ES 99.0% of +1.153 years), and a reverse stress test identifying the critical shock threshold for solvency buffer exhaustion. This research provides evidence that neural networks, when properly anchored by actuarial principles, can serve as effective model challengers for longevity risk management under the SST and Solvency II standards.



Details and Ablation Studies for Language Modelling

Neural Information Processing Systems

A.1 Experimental Settings All language models in Table 1 have the same Transformer configuration: a 16-layer model with a hidden size of 128 with 8 heads, and a feed-forward dimension of 2048. We use a dropout [75, 76, 77] rate of 0.1. The batch size is 96 and we train for about 120 epochs with Adam optimiser [78] with an initial learning rate of 0.00025 and 2000 learning rate warm-up steps. All models are trained with a back-propagation span of 256 tokens. During training, these segments are treated independently, except for the + full context cases in Table 1 where the states (both recurrent states and fast weight states) from a segment are used as initialisation for the subsequent segment. The models in + full context cases are also evaluated in the same way by carrying over the context throughout the evaluation text with a batch size of one. For all other cases, the evaluation is done by going through the text with a sliding window of size 256 with a batch size of one. Transformer states are computed for all positions in each window, but only the last position is used to compute perplexity (except in the first segment where all positions are used for evaluation) [2].





Supplementary Material for ' Causality Preserving Chaotic Transformation and Classification using Neurochaos Learning '

Neural Information Processing Systems

This is the supplementary information pertaining to the main manuscript. In this supplementary material, we provide the comparative performance of Neurochaos Learning with Deep Neural Network, 1DConvolutional Neural Network (1D CNN), and Long Short term Memory (LSTM) for evaluation of cause-effect classification of timeseries data generated from coupled chaotic master-slave system and autoregressive (AR) processes. We also check whether each of these architectures are able to preserve cause-effect relationship between the corresponding features extracted from the original cause and effect time series. To evaluate the efficacy of Neurochaos Learning (NL: ChaosNet) and deep learning algorithms for the classification of cause-effect, we used simulated datasets from (a) coupled autoregressive (AR) processes, and (b) coupled 1D chaotic skew tent-maps in master-slave configuration. The governing equations for the coupled AR processes are the following: M(t)=a1M(t 1)+γr(t), (1) S(t)=a2S(t 1)+ηM(t 1)+γr(t), (2) where M(t) and S(t) are the independent and the dependent (or the cause and effect) time series respectively; a1 = 0.8, a2 = 0.9, the noise intensity γ = 0.03 and r(t) is independent and identically distributed additive Gaussian noise drawn from a standard normal distribution.



Time-Warping Recurrent Neural Networks for Transfer Learning

arXiv.org Machine Learning

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.