Simplifying and Understanding State Space Models with Diagonal Linear RNNs

Gupta, Ankit, Mehta, Harsh, Berant, Jonathan

Nov-14-2023–arXiv.org Artificial Intelligence

Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs ($\mathrm{DLR}$). We empirically show that, despite being conceptually much simpler, $\mathrm{DLR}$ is as performant as previously-proposed SSMs on a variety of tasks and benchmarks including Long Range Arena and raw speech classification. Moreover, we characterize the expressivity of SSMs (including $\mathrm{DLR}$) and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via $\textit{few}$ convolutional kernels, they struggle on tasks requiring $\textit{many}$ such kernels and especially when the desired sequence manipulation is $\textit{context-dependent}$. Despite these limitations, $\mathrm{DLR}$ reaches high performance on two higher-order reasoning tasks $\mathrm{ListOpsSubTrees}$ and $\mathrm{PathfinderSegmentation}\text{-}\mathrm{256}$ with input lengths $8K$ and $65K$ respectively, and gives encouraging performance on $\mathrm{PathfinderSegmentation}\text{-}\mathrm{512}$ with input length $262K$ for which attention is not a viable choice.

dlr, kernel, sequence, (14 more...)

arXiv.org Artificial Intelligence

Nov-14-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.04)
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
    - California > Los Angeles County
      - Long Beach (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Asia > Middle East
  - Israel > Tel Aviv District > Tel Aviv (0.04)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning (0.82)
  - Vision (0.66)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found