Publication record · 18.cifr/2023.touvron.llama-architecture
18.cifr/2023.touvron.llama-architectureWe introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
Fine-tuning on instruction-following datasets is the most immediate extension flagged by the authors. Scaling beyond 65B with public data and extending token budgets beyond 1.4T are natural next steps. Reducing training carbon footprint at scale remains an open challenge.