Publication record · 18.cifr/1992.watkins.q-learning
18.cifr/1992.watkins.q-learningQ-learning is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents a thorough description of Q-learning and a proof of convergence for the algorithm.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
Convergence in large or continuous state spaces via function approximation is flagged as critical. The authors also note open questions about exploration strategy design and the sample-complexity vs convergence-rate tradeoff.