arxiv:2212.00768

Simplifying and Understanding State Space Models with Diagonal Linear RNNs

Published on Nov 14, 2023

Authors:

Abstract

Linear state space models based on diagonal linear RNNs demonstrate competitive performance on long-range sequence modeling tasks while maintaining conceptual simplicity over traditional discretized approaches.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs (DLR). We empirically show that, despite being conceptually much simpler, DLR is as performant as previously-proposed SSMs on a variety of tasks and benchmarks including Long Range Arena and raw speech classification. Moreover, we characterize the expressivity of SSMs (including DLR) and attention-based models via a suite of 13 synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via few convolutional kernels, they struggle on tasks requiring many such kernels and especially when the desired sequence manipulation is context-dependent. Despite these limitations, DLR reaches high performance on two higher-order reasoning tasks ListOpsSubTrees and PathfinderSegmentation-256 with input lengths 8K and 65K respectively, and gives encouraging performance on PathfinderSegmentation-512 with input length 262K for which attention is not a viable choice.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2212.00768 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2212.00768 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2212.00768 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.