AITopics | lstm

Collaborating Authors

lstm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural-Actuarial Longevity Forecasting: Anchoring LSTMs for Explainable Risk Management

Rindori, Davide

arXiv.org Machine LearningMay-8-2026

Traditional multi-population models, such as the Li-Lee framework, rely on the assumption of mean-reverting country-specific deviations. However, recent data from high-longevity clusters suggest a systemic break in this paradigm. We identify a stationarity paradox where mortality residuals in countries like Sweden and West Germany exhibit persistent unit roots, leading to a systematic mispricing of longevity risk in linear models. To address these non-linearities, we propose Hybrid-Lift, a neural-actuarial framework that combines Hierarchical LSTM networks with a Mean-Bias Correction (MBC) anchoring mechanism. Positioned as a governance-friendly model challenger rather than a replacement of classical approaches, the framework exhibits selective superiority on out-of-sample validation (2012-2020): it outperforms Li-Lee by 17.40% in Sweden and 12.57% in West Germany, while remaining comparable for near-linear regimes such as Switzerland and Japan. We complement the predictive model with an integrated governance suite comprising SHAP-based cross-country influence mapping, a dual uncertainty framework for regulatory capital calibration (Swiss ES 99.0% of +1.153 years), and a reverse stress test identifying the critical shock threshold for solvency buffer exhaustion. This research provides evidence that neural networks, when properly anchored by actuarial principles, can serve as effective model challengers for longevity risk management under the SST and Solvency II standards.

artificial intelligence, machine learning, neural-actuarial longevity forecasting rindori, (15 more...)

arXiv.org Machine Learning

2605.06438

Country:

Europe > Germany (0.46)
Europe > Sweden (0.46)
North America > United States (0.46)
Europe > Switzerland (0.36)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine (0.68)
Information Technology > Security & Privacy (0.60)
Banking & Finance (0.46)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Details and Ablation Studies for Language Modelling

Neural Information Processing SystemsApr-25-2026, 14:23:28 GMT

A.1 Experimental Settings All language models in Table 1 have the same Transformer configuration: a 16-layer model with a hidden size of 128 with 8 heads, and a feed-forward dimension of 2048. We use a dropout [75, 76, 77] rate of 0.1. The batch size is 96 and we train for about 120 epochs with Adam optimiser [78] with an initial learning rate of 0.00025 and 2000 learning rate warm-up steps. All models are trained with a back-propagation span of 256 tokens. During training, these segments are treated independently, except for the + full context cases in Table 1 where the states (both recurrent states and fast weight states) from a segment are used as initialisation for the subsequent segment. The models in + full context cases are also evaluated in the same way by carrying over the context throughout the evaluation text with a batch size of one. For all other cases, the evaluation is done by going through the text with a sliding window of size 256 with a batch size of one. Transformer states are computed for all positions in each window, but only the last position is used to compute perplexity (except in the first segment where all positions are used for evaluation) [2].

artificial intelligence, delta rnn, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Add feedback

3f9e3767ef3b10a0de4c256d7ef9805d-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 14:23:25 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > California > Los Angeles County (0.28)

Industry:

Education (0.46)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

324bb74b6d557428e21528379eeb7a0c-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 09:39:05 GMT

artificial intelligence, hyperparameter, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

165a59f7cf3b5c4396ba65953d679f17-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 20:50:27 GMT

artificial intelligence, machine learning, manipulation task, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Supplementary Material for ' Causality Preserving Chaotic Transformation and Classification using Neurochaos Learning '

Neural Information Processing SystemsApr-24-2026, 13:53:40 GMT

This is the supplementary information pertaining to the main manuscript. In this supplementary material, we provide the comparative performance of Neurochaos Learning with Deep Neural Network, 1DConvolutional Neural Network (1D CNN), and Long Short term Memory (LSTM) for evaluation of cause-effect classification of timeseries data generated from coupled chaotic master-slave system and autoregressive (AR) processes. We also check whether each of these architectures are able to preserve cause-effect relationship between the corresponding features extracted from the original cause and effect time series. To evaluate the efficacy of Neurochaos Learning (NL: ChaosNet) and deep learning algorithms for the classification of cause-effect, we used simulated datasets from (a) coupled autoregressive (AR) processes, and (b) coupled 1D chaotic skew tent-maps in master-slave configuration. The governing equations for the coupled AR processes are the following: M(t)=a1M(t 1)+γr(t), (1) S(t)=a2S(t 1)+ηM(t 1)+γr(t), (2) where M(t) and S(t) are the independent and the dependent (or the cause and effect) time series respectively; a1 = 0.8, a2 = 0.9, the noise intensity γ = 0.03 and r(t) is independent and identically distributed additive Gaussian noise drawn from a standard normal distribution.

artificial intelligence, deep learning, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > India > Karnataka (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Full-Capacity Unitary Recurrent Neural Networks

Scott Wisdom, Thomas Powers, John Hershey, Jonathan Le Roux, Les Atlas

Neural Information Processing SystemsApr-22-2026, 08:43:40 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, matrix, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Time-Warping Recurrent Neural Networks for Transfer Learning

Hirschi, Jonathon

arXiv.org Machine LearningApr-6-2026

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.

artificial intelligence, machine learning, prediction, (20 more...)

arXiv.org Machine Learning

2604.02474

Country:

North America > United States > Colorado > Denver County > Denver (0.14)
North America > United States > Oklahoma (0.06)
North America > United States > Rocky Mountains (0.04)
(15 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

xLSTM: Extended Long Short-Term Memory

Neural Information Processing SystemsMar-22-2026, 09:35:40 GMT

In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.

deep learning, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Universal In-Context Approximation By Prompting Fully Recurrent Models

Neural Information Processing SystemsMar-21-2026, 09:46:03 GMT

Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks. We also study the role of multiplicative gating and observe that architectures incorporating such gating (e.g., LSTMs, GRUs, Hawk/Griffin) can implement certain operations more stably, making them more viable candidates for practical in-context universal approximation.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback