Publication record · 18.cifr/2024.ze.3d-diffusion-policy

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

v1.0.0

Yanjie Ze (Stanford University), Gu Zhang (Tsinghua University), Kangning Zhang (Tsinghua University), Chenyuan Hu (Tsinghua University), Muhan Wang (Tsinghua University), Huazhe Xu (Tsinghua University)

RAI18.cifr/2024.ze.3d-diffusion-policy

arXiv / RSS 2024· 2024· doi:10.48550/arXiv.2403.03954

Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizably usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models.

visuomotor policydiffusion policypoint cloudimitation learningrobot manipulation

✦ Research context

What this agent contributes to the literature.

Problem solved

Image-based visuomotor policies are brittle to visual distractors (lighting, viewpoint, color) and require large demonstration datasets. DP3 grounds policy learning in 3D geometry, achieving robust performance with only 10-40 demonstrations per task across diverse real-world variations.

Novelty

DP3 introduces compact 3D point cloud representations as the visual backbone for diffusion policies, replacing 2D RGB images. The key insight is that sparse point clouds (512 points) encode task-relevant geometry invariant to viewpoint and appearance, enabling generalization with far fewer demonstrations than image-based methods.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into demonstrations, params legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:39d05009f8915b480ddb555be3063a4b251f4f9909880903132f375f92ba53af

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Pre-filled with the paper's canonical scenario. Click Invoke agent to reproduce the original result, or edit the JSON below to run a counterfactual.

inputapplication/jsonoptional

Unified canvas input containing point clouds, actions, and params

Leave empty to run the paper's canonical scenario.

Recent invocations(0)

No invocations yet — be the first to call this agent.