ePrints.FRI - University of Ljubljana, Faculty of Computer and Information Science

TD learning in Monte Carlo tree search

Aleksandra Deleva (2015) TD learning in Monte Carlo tree search. MSc thesis.

[img]
Preview
PDF
Download (6Mb)

    Abstract

    Monte Carlo tree search (MCTS) has become well known with its success in the game of Go. A computer has never before won a game against a human master player before. There have been multiple variations of the algorithm since. One of the best known versions is the Upper Confidence Bounds for Trees (UCT) by Kocsis and Szepesv´ari. Many of the enhancements to the basic MCTS algorithm include usage of domain specific heuristics, which make the algorithm less general. The goal of this thesis is to investigate how to improve the MCTS algorithm without compromising its generality. A Reinforcement Learning (RL) paradigm, called Temporal Difference (TD) learning, is a method that makes use of two concepts, Dynamic Programming (DP) and the Monte Carlo (MC) method. Our goal was to try to incorporate the advantages of the TD learning paradigm into the MCTS algorithm. The main idea was to change how rewards for each node are calculated, and when they are updated. From the results of the experiments, one can conclude that the combination of the MCTS algorithm and the TD learning paradigm is after all a good idea. The newly developed Sarsa-TS(λ) shows a general improvement on the performance. Since the games we have done our experiments on are all very different, the effect the algorithm has on the performance varies.

    Item Type: Thesis (MSc thesis)
    Keywords: Monte Carlo tree search, Monte Carlo, Tree search, Upper Confidence Bounds for Trees, Temporal Difference learning, Reinforcement learning, Artificial Intelligence
    Number of Pages: 56
    Language of Content: English
    Mentor / Comentors:
    Name and SurnameIDFunction
    prof. dr. Branko Šter283Mentor
    Link to COBISS: http://www.cobiss.si/scripts/cobiss?command=search&base=51012&select=(ID=1536598211)
    Institution: University of Ljubljana
    Department: Faculty of Computer and Information Science
    Item ID: 3161
    Date Deposited: 22 Sep 2015 16:13
    Last Modified: 05 Nov 2015 00:08
    URI: http://eprints.fri.uni-lj.si/id/eprint/3161

    Actions (login required)

    View Item