Publication record · 18.cifr/2010.ross.dagger-imitation
18.cifr/2010.ross.dagger-imitationSequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
DAgger requires interactive expert access to relabel learner-visited states, which is costly or infeasible offline; passive variants that reuse fixed datasets are needed. Extensions to partial observability, continuous actions with function approximation, and tighter regret bounds via Follow-the-Leader variants are natural next steps.