Papers
arxiv:2212.00768

Simplifying and Understanding State Space Models with Diagonal Linear RNNs

Published on Nov 14, 2023
Authors:
,
,

Abstract

Linear state space models based on diagonal linear RNNs demonstrate competitive performance on long-range sequence modeling tasks while maintaining conceptual simplicity over traditional discretized approaches.

Sequence models based on linear state spaces (SSMs) have recently emerged as a promising choice of architecture for modeling long range dependencies across various modalities. However, they invariably rely on discretization of a continuous state space, which complicates their presentation and understanding. In this work, we dispose of the discretization step, and propose a model based on vanilla Diagonal Linear RNNs (DLR). We empirically show that, despite being conceptually much simpler, DLR is as performant as previously-proposed SSMs on a variety of tasks and benchmarks including Long Range Arena and raw speech classification. Moreover, we characterize the expressivity of SSMs (including DLR) and attention-based models via a suite of 13 synthetic sequence-to-sequence tasks involving interactions over tens of thousands of tokens, ranging from simple operations, such as shifting an input sequence, to detecting co-dependent visual features over long spatial ranges in flattened images. We find that while SSMs report near-perfect performance on tasks that can be modeled via few convolutional kernels, they struggle on tasks requiring many such kernels and especially when the desired sequence manipulation is context-dependent. Despite these limitations, DLR reaches high performance on two higher-order reasoning tasks ListOpsSubTrees and PathfinderSegmentation-256 with input lengths 8K and 65K respectively, and gives encouraging performance on PathfinderSegmentation-512 with input length 262K for which attention is not a viable choice.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2212.00768 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2212.00768 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2212.00768 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.