AITopics | feed-forward network

In this way, the input of LLM does not require visual tokens, which reduces the length of the input sequence and greatly improves efficiency. Following this paradigm, we propose VLoRA with the perceptual weights generator.

large language model, machine learning, perceptual weight, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Anhui Province > Hefei (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Singapore (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

786ab8c4d7ee758f80d57e65582e609d-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 00:58:23 GMT

assumption 3, manifold, topology, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > California (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(7 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

3c09bb10e2189124fdd8f467cc8b55a7-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 03:54:43 GMT

artificial intelligence, machine learning, representation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

1d774c112926348c3e25ea47d87c835b-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 19:15:35 GMT

anomaly detection, localization, query, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On the use of case estimate and transactional payment data in neural networks for individual loss reserving

Avanzi, Benjamin, Lambrianidis, Matthew, Taylor, Greg, Wong, Bernard

arXiv.org Machine LearningJan-12-2026

The use of neural networks trained on individual claims data has become increasingly popular in the actuarial reserving literature. We consider how to best input historical payment data in neural network models. Additionally, case estimates are also available in the format of a time series, and we extend our analysis to assessing their predictive power. In this paper, we compare a feed-forward neural network trained on summarised transactions to a recurrent neural network equipped to analyse a claim's entire payment history and/or case estimate development history. We draw conclusions from training and comparing the performance of the models on multiple, comparable highly complex datasets simulated from SPLICE (Avanzi, Taylor and Wang, 2023). We find evidence that case estimates will improve predictions significantly, but that equipping the neural network with memory only leads to meagre improvements. Although the case estimation process and quality will vary significantly between insurers, we provide a standardised methodology for assessing their value.

artificial intelligence, case estimate, machine learning, (19 more...)

arXiv.org Machine Learning

2601.05274

Country: Oceania > Australia (0.46)

Genre: Research Report (0.83)

Industry:

Law (0.93)
Banking & Finance > Insurance (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Flash Multi-Head Feed-Forward Network

Zhang, Minshen, Hu, Xiang, Li, Jianguo, Wu, Wei, Tu, Kewei

arXiv.org Artificial IntelligenceDec-9-2025

We explore Multi-Head FFN (MH-FFN) as a replacement of FFN in the Transformer architecture, motivated by the structural similarity between single-head attention and FFN. While multi-head mechanisms enhance expressivity in attention, naively applying them to FFNs faces two challenges: memory consumption scaling with the head count, and an imbalanced ratio between the growing intermediate size and the fixed head dimension as models scale, which degrades scalability and expressive power. To address these challenges, we propose Flash Multi-Head FFN (FlashMHF), with two key innovations: an I/O-aware fused kernel computing outputs online in SRAM akin to FlashAttention, and a design using dynamically weighted parallel sub-networks to maintain a balanced ratio between intermediate and head dimensions. Validated on models from 128M to 1.3B parameters, FlashMHF consistently improves perplexity and downstream task accuracy over SwiGLU FFNs, while reducing peak memory usage by 3-5x and accelerating inference by up to 1.08x. Our work establishes the multi-head design as a superior architectural principle for FFNs, presenting FlashMHF as a powerful, efficient, and scalable alternative to FFNs in Transformers.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.06989

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Task Vectors and Gradients

Zhou, Luca, Solombrino, Daniele, Crisostomi, Donato, Bucarelli, Maria Sofia, D'Inverno, Giuseppe Alessio, Silvestri, Fabrizio, Rodolà, Emanuele

arXiv.org Artificial IntelligenceOct-21-2025

Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into one. Despite its empirical success, a clear theoretical explanation of why and when it works is lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.

artificial intelligence, machine learning, task vector, (15 more...)

arXiv.org Artificial Intelligence

2508.16082

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Filters

Collaborating Authors

feed-forward network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

1d774c112926348c3e25ea47d87c835b-Supplemental-Conference.pdf

30ae2af8612ac74357363e8ae877d80c-Supplemental-Conference.pdf

38ef4b66cb25e92abe4d594acb841471-Paper.pdf

Visual Perception by Large Language Model's Weights

786ab8c4d7ee758f80d57e65582e609d-Supplemental.pdf

3c09bb10e2189124fdd8f467cc8b55a7-Supplemental.pdf

1d774c112926348c3e25ea47d87c835b-Supplemental-Conference.pdf

On the use of case estimate and transactional payment data in neural networks for individual loss reserving

Flash Multi-Head Feed-Forward Network

On Task Vectors and Gradients