AITopics

Country: Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Stein, Viktor, Li, Wuchen, Steidl, Gabriele

SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds

arXiv.org Machine LearningMar-18-2026

Transformers owe much of their empirical success in natural language processing to the self-attention blocks. Recent perspectives interpret attention blocks as interacting particle systems, whose mean-field limits correspond to gradient flows of interaction energy functionals on probability density spaces equipped with Wasserstein-$2$-type metrics. We extend this viewpoint by introducing accelerated attention blocks derived from inertial Nesterov-type dynamics on density spaces. In our proposed architecture, tokens carry both spatial (feature) and velocity variables. The time discretization and the approximation of accelerated density dynamics yield Hamiltonian momentum attention blocks, which constitute the proposed accelerated attention architectures. In particular, for linear self-attention, we show that the attention blocks approximate a Stein variational gradient flow, using a bilinear kernel, of a potential energy. In this setting, we prove that elliptically contoured probability distributions are preserved by the accelerated attention blocks. We present implementable particle-based algorithms and demonstrate that the proposed accelerated attention blocks converge faster than the classical attention blocks while preserving the number of oracle calls.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2603.16535

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Switzerland (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsFeb-12-2026, 04:50:45 GMT

d78ece6613953f46501b958b7bb4582f-Supplemental-Conference.pdf

architecture, augmentation, execution time, (15 more...)

Country: Asia > China > Hong Kong (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-10-2026, 18:44:04 GMT

3da292ced54290c19fc55d9dba3da793-Paper-Conference.pdf

machine learning, natural language, segmentation, (20 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Industry: Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Neural Information Processing SystemsFeb-10-2026, 17:45:07 GMT

3d226fb8fbd6ee6ec70d0427f1319707-Paper-Conference.pdf

computer vision, pixel, proceedings, (13 more...)

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Pour, Mohamadreza Akbari, Karimi, Mohamad Sadeq, Mazloumi, Amir Hossein

Temporal convolutional and fusional transformer model with Bi-LSTM encoder-decoder for multi-time-window remaining useful life prediction

arXiv.org Artificial IntelligenceDec-9-2025

Health prediction is crucial for ensuring reliability, minimizing downtime, and optimizing maintenance in industrial systems. Remaining Useful Life (RUL) prediction is a key component of this process; however, many existing models struggle to capture fine-grained temporal dependencies while dynamically prioritizing critical features across time for robust prognostics. To address these challenges, we propose a novel framework that integrates Temporal Convolutional Networks (TCNs) for localized temporal feature extraction with a modified Temporal Fusion Transformer (TFT) enhanced by Bi-LSTM encoder-decoder. This architecture effectively bridges short- and long-term dependencies while emphasizing salient temporal patterns. Furthermore, the incorporation of a multi-time-window methodology improves adaptability across diverse operating conditions. Extensive evaluations on benchmark datasets demonstrate that the proposed model reduces the average RMSE by up to 5.5%, underscoring its improved predictive accuracy compared to state-of-the-art methods. By closing critical gaps in current approaches, this framework advances the effectiveness of industrial prognostic systems and highlights the potential of advanced time-series transformers for RUL prediction.

artificial intelligence, machine learning, prediction, (18 more...)

doi: 10.1109/ACCESS.2025.3634285

2511.04723

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Willis, Regan, Bakos, Jason

Exploring Fusion Strategies for Multimodal Vision-Language Systems

arXiv.org Artificial IntelligenceDec-1-2025

Modern machine learning models often combine multiple input streams of data to more accurately capture the information that informs their decisions. In multimodal machine learning, choosing the strategy for fusing data together requires careful consideration of the application's accuracy and latency requirements, as fusing the data at earlier or later stages in the model architecture can lead to performance changes in accuracy and latency. T o demonstrate this trade-off, we investigate different fusion strategies using a hybrid BERT and vision network framework that integrates image and text data. W e explore two different vision networks: MobileNetV2 and ViT. W e propose three models for each vision network, which fuse data at late, intermediate, and early stages in the architecture. W e evaluate the proposed models on the CMU-MOSI dataset and benchmark their latency on an NVIDIA Jetson Orin AGX. Our experimental results demonstrate that while late fusion yields the highest accuracy, early fusion offers the lowest inference latency. W e describe the three proposed model architectures and discuss the accuracy and latency trade-offs, concluding that data fusion earlier in the model architecture results in faster inference times at the cost of accuracy.

artificial intelligence, information fusion, machine learning, (16 more...)

2511.21889

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.91)

arXiv.org Artificial IntelligenceNov-24-2025

FAR: Function-preserving Attention Replacement for IMC-friendly Inference

Ren, Yuxin, Collins, Maxwell D, Hu, Miao, Yang, Huanrui

While transformers dominate modern vision and language models, their attention mechanism remains poorly suited for in-memory computing (IMC) devices due to intensive activation-to-activation multiplications and non-local memory access, leading to substantial latency and bandwidth overhead on ReRAM-based accelerators. To address this mismatch, we propose FAR, a Function-preserving Attention Replacement framework that substitutes all attention in pretrained DeiTs with sequential modules inherently compatible with IMC dataflows. Specifically, FAR replaces self-attention with a multi-head bidirectional LSTM architecture via block-wise distillation to retain functional equivalence while enabling linear-time computation and localized weight reuse. We further incorporate structured pruning on FAR models, enabling flexible adaptation to resource-constrained IMC arrays while maintaining functional fidelity. Evaluations on the DeiT family demonstrate that FAR maintains comparable accuracy to the original attention-based models on ImageNet and multiple downstream tasks with reduced parameters and latency. Further analysis shows that FAR preserves the semantic token relationships learned by attention while improving computational efficiency, highlighting its potential for energy-efficient transformer inference on IMC-based edge accelerators.

artificial intelligence, machine learning, natural language, (20 more...)

2505.21535

Country: North America > United States > Arizona > Pima County > Tucson (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Riera, Carlos Boned, Sanchez, David Romero, Terrades, Oriol Ramos

ODE-ViT: Plug & Play Attention Layer from the Generalization of the ViT as an Ordinary Differential Equation

arXiv.org Artificial IntelligenceNov-21-2025

In recent years, increasingly large models have achieved outstanding performance across CV tasks. However, these models demand substantial computational resources and storage, and their growing complexity limits our understanding of how they make decisions. Most of these architectures rely on the attention mechanism within Transformer-based designs. Building upon the connection between residual neural networks and ordinary differential equations (ODEs), we introduce ODE-ViT, a Vision Transformer reformulated as an ODE system that satisfies the conditions for well-posed and stable dynamics. Experiments on CIFAR-10 and CIFAR-100 demonstrate that ODE-ViT achieves stable, interpretable, and competitive performance with up to one order of magnitude fewer parameters, surpassing prior ODE-based Transformer approaches in classification tasks. We further propose a plug-and-play teacher-student framework in which a discrete ViT guides the continuous trajectory of ODE-ViT by treating the intermediate representations of the teacher as solutions of the ODE. This strategy improves performance by more than 10% compared to training a free ODE-ViT from scratch.

artificial intelligence, machine learning, natural language, (14 more...)

2511.16501

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceNov-11-2025

Machine-Learning Accelerated Calculations of Reduced Density Matrices

Azam, Awwab A., Zhao, Lexu, Yu, Jiabin

$n$-particle reduced density matrices ($n$-RDMs) play a central role in understanding correlated phases of matter. Yet the calculation of $n$-RDMs is often computationally inefficient for strongly-correlated states, particularly when the system sizes are large. In this work, we propose to use neural network (NN) architectures to accelerate the calculation of, and even predict, the $n$-RDMs for large-size systems. The underlying intuition is that $n$-RDMs are often smooth functions over the Brillouin zone (BZ) (certainly true for gapped states) and are thus interpolable, allowing NNs trained on small-size $n$-RDMs to predict large-size ones. Building on this intuition, we devise two NNs: (i) a self-attention NN that maps random RDMs to physical ones, and (ii) a Sinusoidal Representation Network (SIREN) that directly maps momentum-space coordinates to RDM values. We test the NNs in three 2D models: the pair-pair correlation functions of the Richardson model of superconductivity, the translationally-invariant 1-RDM in a four-band model with short-range repulsion, and the translation-breaking 1-RDM in the half-filled Hubbard model. We find that a SIREN trained on a $6\times 6$ momentum mesh can predict the $18\times 18$ pair-pair correlation function with a relative accuracy of $0.839$. The NNs trained on $6\times 6 \sim 8\times 8$ meshes can provide high-quality initial guesses for $50\times 50$ translation-invariant Hartree-Fock (HF) and $30\times 30$ fully translation-breaking-allowed HF, reducing the number of iterations required for convergence by up to $91.63\%$ and $92.78\%$, respectively, compared to random initializations. Our results illustrate the potential of using NN-based methods for interpolable $n$-RDMs, which might open a new avenue for future research on strongly correlated phases.

artificial intelligence, machine learning, siren, (17 more...)

2511.07367

Country: North America > United States > Florida > Alachua County > Gainesville (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)