Publication record · 18.cifr/2023.gu.mamba-selective-ssm
18.cifr/2023.gu.mamba-selective-ssmFoundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
The CUDA parallel scan kernel limits portability to NVIDIA hardware; a triton or Metal backend is needed for broader adoption. Scaling beyond 3B parameters and hybrid Mamba-attention architectures for tasks requiring both global and local reasoning are natural next steps.