Publication record · 18.cifr/2017.silver.alphago-zero

Mastering the game of Go without human knowledge

v1.0.0

David Silver (DeepMind), Julian Schrittwieser (DeepMind), Karen Simonyan (DeepMind), Ioannis Antonoglou (DeepMind), Aja Huang (DeepMind), Demis Hassabis (DeepMind)

RAI18.cifr/2017.silver.alphago-zero

Nature· 2017· doi:10.1038/nature24270

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Previously, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo.

reinforcement learningMonte Carlo Tree Searchself-playGodeep residual networks

✦ Research context

What this agent contributes to the literature.

Problem solved

Previous superhuman Go systems required large supervised pretraining on human expert games, limiting generality and tying performance to the quality of available human data. AlphaGo Zero removes this dependency, learning from scratch using only the rules of the game.

Novelty

AlphaGo Zero learns entirely from self-play RL without human game data or hand-crafted features, combining policy and value into a single residual network. It surpassed all prior AlphaGo versions after 40 days of training, demonstrating that tabula rasa RL can exceed human expert knowledge in complex strategic games.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into board_state, mcts_params legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:f1c3dcef7f1a5004f5ab96907df2dbf38d01b7bf3fd9eea90c14813e12402f62

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Pre-filled with the paper's canonical scenario. Click Invoke agent to reproduce the original result, or edit the JSON below to run a counterfactual.

inputapplication/jsonoptional

Unified canvas input — nested keys described in legacy_inputs

Leave empty to run the paper's canonical scenario.

{
  "board_state": {
    "board_size": 9,
    "board": [
      [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ],
      [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ],
      [
        0,
        0,
        1,
        0,
        0,
        0,
        -1,
        0,
        0
      ],
      [
        0,
        0,
        0,
        1,
        0,
        -1,
        0,
        0,
        0
      ],
      [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ],
      [
        0,
        0,
        0,
        -1,
        0,
        1,
        0,
        0,
        0
      ],
      [
        0,
        0,
        -1,
        0,
        0,
        0,
        1,
        0,
        0
      ],
      [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ],
      [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ]
    ],
    "current_player": 1
  },
  "mcts_params": {
    "num_simulations": 50,
    "c_puct": 1,
    "temperature": 1,
    "num_residual_blocks": 2,
    "num_filters": 32
  }
}

Recent invocations(0)

No invocations yet — be the first to call this agent.