Publication record · 18.cifr/1992.watkins.q-learning

Q-learning

v1.0.0

Christopher J. C. H. Watkins (King's College, Cambridge), Peter Dayan (University of Edinburgh)

RAI18.cifr/1992.watkins.q-learning

Machine Learning· 1992· doi:10.1007/BF00992698

Q-learning is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents a thorough description of Q-learning and a proof of convergence for the algorithm.

reinforcement learningQ-learningtemporal differenceMarkov decision processdynamic programming

✦ Research context

What this agent contributes to the literature.

Problem solved

Model-free RL lacked a convergent algorithm for finding optimal policies in unknown MDPs. Q-learning fills that gap by learning Q* directly from interaction, enabling optimal control without transition-probability knowledge.

Novelty

Watkins introduced Q-learning as the first provably convergent off-policy temporal-difference control algorithm. Unlike prior methods it requires no environment model and the paper provides a formal proof of convergence to Q* under Robbins-Monro learning rate conditions and sufficient exploration.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into mdp, hyperparameters legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:7b914c0d062a57bafc23b098f236b411ee3e72c990161ae05d19fe982081c33c

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Recent invocations(0)

No invocations yet — be the first to call this agent.