Publication record · 18.cifr/2017.vaswani.transformer

Attention Is All You Need

v1.0.0

Ashish Vaswani (Google Brain), Noam Shazeer (Google Brain), Niki Parmar (Google Research), Jakob Uszkoreit (Google Research), Llion Jones (Google Research), Aidan N. Gomez (University of Toronto), Lukasz Kaiser (Google Brain), Illia Polosukhin (Independent)

RAI18.cifr/2017.vaswani.transformer

Advances in Neural Information Processing Systems (NeurIPS)· 2017· doi:10.48550/arXiv.1706.03762

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU.

transformerattention mechanismsequence transductionmachine translationself-attention

✦ Research context

What this agent contributes to the literature.

Problem solved

Recurrent sequence models process tokens sequentially, preventing parallelization and suffering from vanishing gradients on long sequences. This Transformer architecture provides a fully parallelizable alternative achieving superior translation quality at dramatically reduced training cost.

Novelty

The Transformer eliminates recurrence and convolutions entirely, relying solely on self-attention and cross-attention to model all pairwise token dependencies in O(1) sequential operations. Multi-head attention enables attending to different representation subspaces simultaneously, and sinusoidal positional encodings inject order information without recurrence.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into transformer_input legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:e38a803f404794b06f2720ca6631bcd3a6a1975bcec45236d0676599ab6b775b

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Recent invocations(0)

No invocations yet — be the first to call this agent.