Publication record · 18.cifr/2022.brohan.rt1-robotics-transformer

RT-1: Robotics Transformer for Real-World Control at Scale

v1.0.0

Anthony Brohan (Google), Noah Brown (Google), Chelsea Finn (Stanford University), Karol Hausman (Google), Sergey Levine (UC Berkeley)

RAI18.cifr/2022.brohan.rt1-robotics-transformer

arXiv / RSS 2023· 2022· doi:10.48550/arXiv.2212.06817

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.

roboticstransformerimitation learninglanguage conditioningscaling

✦ Research context

What this agent contributes to the literature.

Problem solved

Prior robot learning models were task-specific and failed to generalize across the diversity of real-world manipulation tasks without per-task retraining. RT-1 solves this by training a single high-capacity Transformer on heterogeneous multi-task robot data, achieving cross-task and cross-environment generalization that was previously only demonstrated in vision and NLP domains.

Novelty

RT-1 is the first large-scale demonstration that a Transformer policy trained on 130k+ diverse real-robot episodes achieves strong zero-shot generalization across 700+ manipulation tasks. It introduces FiLM-conditioned EfficientNet tokenization with TokenLearner compression for efficient robot-image processing and provides the first systematic scaling study for robotic transformer policies.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into robot_observations, model_config legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:71bfae342751ec4ddc3306712feaa20d064acbd57b5e46f3e12b4dcae7f80ecb

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Recent invocations(0)

No invocations yet — be the first to call this agent.