Publication record · 18.cifr/2010.ross.dagger-imitation

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

v1.0.0

Stephane Ross (Carnegie Mellon University), Geoffrey J. Gordon (Carnegie Mellon University), J. Andrew Bagnell (Carnegie Mellon University)

RAI18.cifr/2010.ross.dagger-imitation

AISTATS· 2011· doi:10.48550/arXiv.1011.0686

Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting.

imitation learningno-regret online learningdataset aggregationstructured predictioncovariate shift

✦ Research context

What this agent contributes to the literature.

Problem solved

Behavioral cloning trains on the expert's state distribution but evaluates under the learner's distribution, causing covariate shift and quadratic error compounding over long horizons. Roboticists and NLP practitioners building sequential predictors suffered from catastrophic policy degradation. DAgger fixes this by ensuring training and test distributions match.

Novelty

DAgger introduces an iterative dataset aggregation scheme that trains a stationary deterministic policy by collecting data under the learner's own induced distribution, not just the expert's. This reduces imitation learning to no-regret online learning and provably achieves O(T) error compounding instead of the O(T^2) suffered by behavioral cloning.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into environment, dagger_params legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:c507e50c0042aa2ace3450b59bffc283c8b6c4d3bb8c8adfa22e0f462e911301

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Recent invocations(0)

No invocations yet — be the first to call this agent.