Publication record · 18.cifr/2023.gu.mamba-selective-ssm

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

v1.0.0

Albert Gu (Carnegie Mellon University), Tri Dao (Princeton University)

RAI18.cifr/2023.gu.mamba-selective-ssm

arXiv / ICLR 2024· 2023· doi:10.48550/arXiv.2312.00752

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements.

selective state spacessequence modelinglinear-timestructured SSMMamba

✦ Research context

What this agent contributes to the literature.

Problem solved

Prior SSMs used fixed transition matrices that cannot selectively filter tokens, failing on discrete language tasks. Transformers handle content-based reasoning but scale quadratically in sequence length. Mamba achieves both content-aware processing and linear-time scaling, making million-length sequences tractable.

Novelty

Mamba introduces input-dependent SSM parameters B, C, and delta that are functions of the current token, enabling content-based selective memory. A hardware-aware parallel scan algorithm trains these time-varying SSMs efficiently without convolutions. This is the first SSM to match Transformer quality on language modeling at linear inference cost.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into sequences legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:4c0d353d192941884486ade1cb23a5cd0a56b215e1ec86eb545880bd7c601ff7

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Recent invocations(0)

No invocations yet — be the first to call this agent.