AITopics | fnn

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Neural Information Processing SystemsJun-22-2026, 16:55:44 GMT

Numerous studies have demonstrated that the Transformer architecture possesses the capability for in-context learning (ICL). In scenarios involving function approximation, context can serve as a control parameter for the model, endowing it with the universal approximation property (UAP). In practice, context is represented by tokens from a finite set, referred to as a vocabulary, which is the case considered in this paper, i.e., vocabulary in-context learning (VICL). We demonstrate that VICL in single-layer Transformers, without positional encoding, does not possess the UAP; however, it is possible to achieve the UAP when positional encoding is included. Several sufficient conditions for the positional encoding are provided. Our findings reveal the benefits of positional encoding from an approximation theory perspective in the context of ICL.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Self-Normalizing Neural Networks

Neural Information Processing SystemsMar-17-2026, 14:36:30 GMT

Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are scaled exponential linear units (SELUs), which induce self-normalizing properties.

artificial intelligence, machine learning, normalization, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

38e491559eb9e4cf31b8cd3a4e222436-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 07:20:31 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On Scrambling Phenomena for Randomly Initialized Recurrent Networks

Neural Information Processing SystemsFeb-9-2026, 21:00:05 GMT

Then,fk is Li-Yorkechaoticwith constant (albeitsmall) probability. There suchthatk (whose minimumvalue ),fk is 3-periodic 1 .

artificial intelligence, machine learning, rnn, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.05)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Networks

Neural Information Processing SystemsFeb-9-2026, 01:32:29 GMT

Despite being powerful and well-understood, the kernel ridge regression suffers from the costly computation when dealing with large datasets, since generally implementation of Eq. (1) requires O(n3) running time.

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)

Add feedback

14d2d59c5e2302bfa6ce6cae59156e33-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 08:04:45 GMT

international conference, neural network, relu mpnn, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Reinforcement Learning for Micro-Level Claims Reserving

Avanzi, Benjamin, Richman, Ronald, Wong, Bernard, Wüthrich, Mario, Xie, Yagebu

arXiv.org Machine LearningJan-13-2026

Outstanding claim liabilities are revised repeatedly as claims develop, yet most modern reserving models are trained as one-shot predictors and typically learn only from settled claims. We formulate individual claims reserving as a claim-level Markov decision process in which an agent sequentially updates outstanding claim liability (OCL) estimates over development, using continuous actions and a reward design that balances accuracy with stable reserve revisions. A key advantage of this reinforcement learning (RL) approach is that it can learn from all observed claim trajectories, including claims that remain open at valuation, thereby avoiding the reduced sample size and selection effects inherent in supervised methods trained on ultimate outcomes only. We also introduce practical components needed for actuarial use -- initialisation of new claims, temporally consistent tuning via a rolling-settlement scheme, and an importance-weighting mechanism to mitigate portfolio-level underestimation driven by the rarity of large claims. On CAS and SPLICE synthetic general insurance datasets, the proposed Soft Actor-Critic implementation delivers competitive claim-level accuracy and strong aggregate OCL performance, particularly for the immature claim segments that drive most of the liability.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2601.07637

Country: Europe > Switzerland (0.27)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Manifolds and Modules: How Function Develops in a Neural Foundation Model

Bertram, Johannes, Dyballa, Luciano, Keller, T. Anderson, Kinger, Savik, Zucker, Steven W.

arXiv.org Artificial IntelligenceDec-10-2025

Foundation models have shown remarkable success in fitting biological visual systems; however, their black-box nature inherently limits their utility for understanding brain function. Here, we peek inside a SOTA foundation model of neural activity (Wang et al., 2025) as a physiologist might, characterizing each 'neuron' based on its temporal response properties to parametric stimuli. We analyze how different stimuli are represented in neural activity space by building decoding manifolds, and we analyze how different neurons are represented in stimulus-response space by building neural encoding manifolds. We find that the different processing stages of the model (i.e., the feedforward encoder, recurrent, and readout modules) each exhibit qualitatively different representational structures in these manifolds. The recurrent module shows a jump in capabilities over the encoder module by 'pushing apart' the representations of different temporal stimulus patterns; while the readout module achieves biological fidelity by using numerous specialized feature maps rather than biologically plausible mechanisms. Overall, we present this work as a study of the inner workings of a prominent neural foundation model, gaining insights into the biological relevance of its internals through the novel analysis of its neurons' joint temporal response patterns.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2512.07869

Country:

Europe (0.46)
North America > United States (0.14)

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Self-Normalizing Neural Networks

Neural Information Processing SystemsNov-21-2025, 15:21:24 GMT

Deep Learning has revolutionized vision via convolutional neural networks (CNNs) and natural language processing via recurrent neural networks (RNNs). However, success stories of Deep Learning with standard feed-forward neural networks (FNNs) are rare. FNNs that perform well are typically shallow and, therefore cannot exploit many levels of abstract representations. We introduce self-normalizing neural networks (SNNs) to enable high-level abstract representations. While batch normalization requires explicit normalization, neuron activations of SNNs automatically converge towards zero mean and unit variance. The activation function of SNNs are scaled exponential linear units (SELUs), which induce self-normalizing properties.

name change, normalization, self-normalizing neural network, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback