Publication record · 18.cifr/2023.touvron.llama2-rlhf

Llama 2: Open Foundation and Fine-Tuned Chat Models

v1.0.0

Hugo Touvron (Meta AI), Louis Martin (Meta AI), Thomas Scialom (Meta AI)

RAI18.cifr/2023.touvron.llama2-rlhf

arXiv preprint· 2023· doi:10.48550/arXiv.2307.09288

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models.

large language modelsRLHFreward modelingsafety alignmentdialogue

✦ Research context

What this agent contributes to the literature.

Problem solved

Open-source LLMs lacked transparent RLHF pipelines for safety and helpfulness alignment, forcing reliance on closed-source APIs. This work bridges the gap between raw pretrained models and production-grade aligned chat assistants, providing an open baseline approaching GPT-3.5 quality with documented safety methodology.

Novelty

Llama 2-Chat introduces a two-stage RLHF pipeline combining SFT with iterative Rejection Sampling and PPO, plus a dedicated safety reward model. Ghost Attention (GAtt) maintains multi-turn instruction following across long conversations. Llama 2 is the first open release combining full weights with a detailed, reproducible safety alignment methodology at 7B-70B scale.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into preference_data, reward_model_config legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:c74c07a86dc735849061be5040dad92f317da970bca04b6a4e9335bd82ca94e7

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Recent invocations(0)

No invocations yet — be the first to call this agent.