AITopics | attn

f0878b7efa656b3bbd407c9248d13751-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 06:39:06 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Transformer Approximations from ReLUs

Hu, Jerry Yao-Chieh, Lu, Mingcheng, Lee, Yi-Chen, Liu, Han

arXiv.org Machine LearningApr-29-2026

We present a systematic recipe for translating ReLU approximation results to softmax Transformers1. Given a constructive ReLU approximator for a target, we construct an explicit softmax transformer with the same accuracy. The recipe applies to many common approximation targets and yields quantitative resource bounds beyond universal approximation statements. This matters because broad Universal Approximation Properties (UAP) still dominate Transformer approximation theory. For softmax Transformer, many universality results provide explicit constructions and quantitative resource bounds (e.g., parameters, depth, width...etc) [Yun et al., 2020, Kajitsuka and Sato, 2023, Takakura and Suzuki, 2023, Jiang and Li, 2024, Hu et al., 2025,

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2604.24878

Country:

North America > United States (0.28)
Asia > Taiwan (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Supplementary of " Twins: Revisiting the Design of Spatial Attention in Vision Transformer "

Neural Information Processing SystemsApr-25-2026, 19:59:49 GMT

Swin transformer: Hierarchical vision transformer using shifted windows.

artificial intelligence, lsa gsa 1, machine learning, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Subcritical Signal Propagation at Initialization in Normalization-Free Transformers

Alekseev, Sergey

arXiv.org Machine LearningApr-15-2026

We study signal propagation at initialization in transformers through the averaged partial Jacobian norm (APJN), a measure of gradient amplification across layers. We extend APJN analysis to transformers with bidirectional attention and permutation-symmetric input token configurations by deriving recurrence relations for activation statistics and APJNs across layers. Our theory predicts how attention modifies the asymptotic behavior of the APJN at large depth and matches APJNs measured in deep vision transformers. The criticality picture known from residual networks carries over to transformers: the pre-LayerNorm architecture exhibits power-law APJN growth, whereas transformers with LayerNorm replaced by elementwise $\tanh$-like nonlinearities have stretched-exponential APJN growth, indicating that the latter are subcritical. Applied to Dynamic Tanh (DyT) and Dynamic erf (Derf) transformers, the theory explains why these architectures can be more sensitive to initialization and optimization choices and require careful tuning for stable training.

artificial intelligence, arxiv, machine learning, (17 more...)

arXiv.org Machine Learning

2604.1189

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Italy > Sardinia (0.04)

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Appendix Tableof Contents

Neural Information Processing SystemsFeb-19-2026, 11:24:51 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)
Information Technology > Artificial Intelligence > Natural Language (0.33)

Add feedback

LearningtoReasonIterativelyandParallellyfor ComplexVisualReasoningScenarios

Neural Information Processing SystemsFeb-18-2026, 18:50:50 GMT

Meanwhile, its"parallel" computation allowsforthesimultaneous explorationofdifferent reasoning paths andbenefits more robust and efficient execution of operations that are mutually independent (e.g. when counting individual colors for the query:"determine the maximum occurring color amongst all t-shirts"). We design IPRM as a lightweight and fully-differentiable neural module thatcanbeconveniently applied toboth transformer and non-transformer vision-language backbones.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: