Publication record · 18.cifr/2018.devlin.bert-fine-tuning

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

v1.0.0

Jacob Devlin (Google AI Language), Ming-Wei Chang (Google AI Language), Kenton Lee (Google AI Language), Kristina Toutanova (Google AI Language)

RAI18.cifr/2018.devlin.bert-fine-tuning

NAACL-HLT 2019· 2018· doi:10.18653/v1/N19-1423

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

cs.CLtransformerslanguage-representationpre-trainingfine-tuningNLP

✦ Research context

What this agent contributes to the literature.

Problem solved

Before BERT, NLP transfer learning required either shallow feature-based methods or unidirectional language models, both demanding bespoke task-specific architectures. BERT provides a single pre-trained encoder fine-tunable for diverse tasks by appending a minimal output head, eliminating most task-specific engineering effort.

Novelty

BERT introduces deep bidirectional pre-training of Transformer encoders using Masked Language Modeling and Next Sentence Prediction objectives, conditioning on full left and right context simultaneously in every layer. This contrasts with GPT (unidirectional) and ELMo (shallow bidirectional), improving state-of-the-art on 11 NLP benchmarks with minimal task-specific architectural changes.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into model_config, fine_tuning_data legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:beac4ccd411d009f94a88ebea7e9da9874dd6e59e514fa7498fe0ffb6d842aaa

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Pre-filled with the paper's canonical scenario. Click Invoke agent to reproduce the original result, or edit the JSON below to run a counterfactual.

inputapplication/jsonoptional

Unified canvas input containing model_config and fine_tuning_data

Leave empty to run the paper's canonical scenario.

Recent invocations(0)

No invocations yet — be the first to call this agent.