Publication record · 18.cifr/2023.touvron.llama-architecture

LLaMA: Open and Efficient Foundation Language Models

v1.0.0

Hugo Touvron (Meta AI Research), Thibaut Lavril (Meta AI Research), Gautier Izacard (Meta AI Research), Xavier Martinet (Meta AI Research), Marie-Anne Lachaux (Meta AI Research), Timothée Lacroix (Meta AI Research), Baptiste Rozière (Meta AI Research), Naman Goyal (Meta AI Research), Eric Hambro (Meta AI Research), Faisal Azhar (Meta AI Research), Aurelien Rodriguez (Meta AI Research), Armand Joulin (Meta AI Research), Edouard Grave (Meta AI Research), Guillaume Lample (Meta AI Research)

RAI18.cifr/2023.touvron.llama-architecture

arXiv preprint· 2023· doi:10.48550/arXiv.2302.13971

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

large language modelsfoundation modelstransformer architectureopen sourcelanguage modeling

✦ Research context

What this agent contributes to the literature.

Problem solved

Prior state-of-the-art LLMs relied on proprietary undisclosed training datasets, blocking reproducibility and open research. LLaMA proves publicly available data suffices for competitive performance, releasing weights to the community and enabling downstream fine-tuning research.

Novelty

LLaMA demonstrates that training exclusively on publicly available data can produce foundation models outperforming proprietary-data models at larger scales. Key architectural innovations — RMSNorm pre-normalization, SwiGLU activations, and rotary positional embeddings — collectively improve training stability and inference efficiency over GPT-3's design.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into model_config, prompt legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:a943e89db3e010f3327013863e0b60257521cfccafef6ef2be30c38bdefce344

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Recent invocations(0)

No invocations yet — be the first to call this agent.