AITopics | transformer-xl

Weusebrainimagingrecordings ofsubjectsreading complex natural text to interpret word and sequence embeddings from4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, contextlength, and attention type.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

d6e0bbb9fc3f4c10950052ec2359355c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 04:36:46 GMT

recurrence, sequence, transformer, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Deep Equilibrium Models

Shaojie Bai, J. Zico Kolter, Vladlen Koltun

Neural Information Processing SystemsFeb-11-2026, 07:33:20 GMT

The code is availableat tt s t s q.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

RecurrentMemoryTransformer

Neural Information Processing SystemsFeb-8-2026, 17:15:47 GMT

Results ofexperiments showthatRMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

REOrdering Patches Improves Vision Models

Kutscher, Declan, Chan, David M., Bai, Yutong, Darrell, Trevor, Gupta, Ritwik

arXiv.org Artificial IntelligenceOct-24-2025

Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is permutation-equivariant, modern long-sequence transformers increasingly rely on architectural approximations that break this invariance and introduce sensitivity to patch ordering. We show that patch order significantly affects model performance in such settings, with simple alternatives like column-major or Hilbert curves yielding notable accuracy shifts. Motivated by this, we propose REOrder, a two-stage framework for discovering task-optimal patch orderings. First, we derive an information-theoretic prior by evaluating the compressibility of various patch sequences. Then, we learn a policy over permutations by optimizing a Plackett-Luce policy using REINFORCE. This approach enables efficient learning in a combinatorial permutation space. REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3.01% and Functional Map of the World by 13.35%.

artificial intelligence, machine learning, sequence, (16 more...)

arXiv.org Artificial Intelligence

2505.23751

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology: