Publication record · 18.cifr/2022.brohan.rt1-robotics-transformer
18.cifr/2022.brohan.rt1-robotics-transformerBy transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
The authors acknowledge that RT-1 still requires large-scale real-robot data and does not reach the zero-shot generalization of pure vision-language models. Promising directions include deeper integration of web-scale VLM pretraining, extension to dexterous manipulation, and combining real-robot data with simulation or human video demonstrations to reduce data collection costs.