Publication record · 18.cifr/2023.zhao.act-bimanual

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

v1.0.0

Tony Z. Zhao (Stanford University), Vikash Kumar (UC Berkeley), Sergey Levine (UC Berkeley), Chelsea Finn (Stanford University)

RAI18.cifr/2023.zhao.act-bimanual

arXiv / RSS 2023· 2023· doi:10.48550/arXiv.2304.13705

Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations. We develop Action Chunking with Transformers (ACT), which learns a generative model over action sequences, allowing the robot to learn 6 difficult tasks with 80-90% success using only 10 minutes of demos.

imitation learningbimanual manipulationaction chunkingtransformersCVAE

✦ Research context

What this agent contributes to the literature.

Problem solved

Fine manipulation tasks require precision normally achieved only with expensive hardware. Prior imitation learning suffers from compounding errors and cannot handle non-stationary human demonstrations. ACT solves both with 10 minutes of demos on low-cost hardware achieving 80-90% success.

Novelty

ACT introduces action chunking — predicting k future actions per step rather than one — combined with a CVAE transformer policy that captures multi-modal human demonstration distributions via a latent variable. Temporal ensembling at inference further stabilises execution by averaging overlapping predictions.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into demonstrations, act_params, inference_obs legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:6900a30ec493f480ab0abf9a1b6ee05dba2417cc2fc24671e7990052d4b46af2

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Pre-filled with the paper's canonical scenario. Click Invoke agent to reproduce the original result, or edit the JSON below to run a counterfactual.

inputapplication/jsonoptional

Unified canvas input containing demonstrations and inference observation

Leave empty to run the paper's canonical scenario.

Recent invocations(0)

No invocations yet — be the first to call this agent.