Monte Carlo Tree Search (MCTS)¶
Monte Carlo Tree Search (MCTS) is a lookahead planning algorithm that builds a search tree incrementally by simulating future trajectories and backing up value estimates. Upper Confidence Tree (UCT) is the dominant variant, using a principled exploration bonus to balance exploiting promising branches versus exploring uncertain ones.
MCTS saw great success when applied towards playing the board game "Go" with the Alpha series of models (AlphaGo, AlphaGo Zero, and AlphaZero), ultimately achieving superhuman performance through pure self-play by combining MCTS with neural networks serving as policy and value functions.
Learn more with these resources:
- Monte Carlo tree search - Wikipedia
- Bandit based Monte-Carlo Planning (Kocsis & Szepesvári, 2006) — the paper introducing UCT, the dominant MCTS variant
- A Survey of Monte Carlo Tree Search Methods (Browne et al., 2012) — comprehensive overview of variants and applications
- Mastering the game of Go with deep neural networks and tree search (Silver et al., 2016) — AlphaGo
- Mastering the game of Go without human knowledge (Silver et al., 2017) — AlphaGo Zero
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play (Silver et al., 2018) — AlphaZero, generalizing to multiple games