Publication record · 18.cifr/2018.devlin.bert-fine-tuning
18.cifr/2018.devlin.bert-fine-tuningWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
BERT's MLM masks tokens independently, ignoring inter-mask dependencies; autoregressive objectives modeling joint distributions could improve this. Extension to generative tasks is non-trivial for an encoder-only model. Obvious next steps include distillation, multilingual pre-training, and domain-adaptive pre-training on specialized corpora.