Computer Go (….continued)

The last post on Computer Go (about a year back) was a brief description of the game, and why it is very difficult to approach Go in the same way we tackled Chess.

In a 4 page article on IEEE spectrum, Feng – Hsiung Hsu – the architect and the principal designer of the IBM Deep Blue – discusses in some detail about how Go could be solved using brute force and variations of Min-Max search itself. The article does not go into great detail about the principal problem with computer Go – the evaluation of board positions. Also, one statement caught my eye:

On top of that, in 10 years Moore’s law is expected to present us with still another 100-fold speedup.

I was under the impression that Moore’s law had already reached its limit. However, by adding more and better processors, supercomputers, will undoubtedly be able to do more operations in parallel. In this article we take a look at the most successful technique that takes advantage of the massive parallel processing power of today’s supercomputers – Monte Carlo Techniques using UCT – and deal the advancements made to it in a later post.

As a side-note – For a while now, I had felt that these approaches are not the best way to tackle Go, and that systems that mimic the behavior of human pattern recognition and learning mechanisms are the way to go ( no pun intended ). There was recently a small discussion in the Go mailing list about the use of Neural Networks in tackling Go. This one remark by Alvaro Begue, has affected strongly my sentiments on the issue:

People tried to make flying machines by imitating birds for a long time. Planes and helicopters fly, but not like birds do it. Similarly, I believe that whenever we figure out how to make machines that play go well, they will not do it like a brain does it.

The point being made here is that the computing systems that we have today are very different from the wirings in our own brains. We do not have very good brain-like pattern recognition system. What we do have are really fast parallel systems that perform simple operations at an amazing speed. And our aim should be to learn to use that effectively.

Monte Carlo Go – MoGo and CrazyStone

The introduction of Monte Carlo methods was a great boost to Computer Go. Currently, the best Computer Go program is MoGo and it uses Monte Carlo evaluation with the UCT algorithm. MC-Go engines explore the game tree upto a certain depth and then use Monte Carlo simulation to evaluate the leaf positions. The biggest issue in computer Go is that there is no way to evaluate the resulting board position after the Min-max tree has been explored to a certain depth. Board evaluation is relatively easier in Chess. Positions and pieces directly give away the value of the board. There is no such simplicity in Go. Territories may change hands. And patterns formed initially may have complex implications across the board.

In Monte Carlo evaluation, multiple simulations are run to evaluate the board position. A stochastic player plays a number of random games against himself starting from this position; until the end, and the average of all the final scores is taken as an estimate of the board’s value. In some cases, the frequency of wins from the position is determined as the position’s value.

Something that we could improve is the stochastic player. Ideally, to effectively evaluate the board position, the stochastic player should model the opponent as closely as possible. This is understandably very difficult to achieve. Instead it is far more practical to make the stochastic player biased towards picking better moves ( as both players are in reality going to draw from a set of good moves ) instead of picking all moves uniformly randomly.

A naive implementation of Monte Carlo simulation based evaluation function, where every leaf is evaluated by running a constant number of simulations, will naturally waste simulations on poor board positions. Instead, its better distribute the available simulations properly to evaluate more promising moves more. This leads us to the Exploration vs Exploitation problem.

Exploration vs Exploitation problem

An intuitive example of this is the n-armed bandit problem. A gambler wants to maximize his winnings from a slot machine with n arms. Each arm has a different mean winning associated with it. Through repeated plays the gambler is to maximize his winnings by concentrating his plays on the best levers, except he doesn’t know which lever is the best. In the absence of this knowledge, the gambler’s choices are to choose between a greedy action (exploitation), where he selects the lever which has given him the maximum reward so far; or to explore and choose a different lever in the hope that it’ll give a greater reward. In MC-Go, during evaluation, every move is treated as different bandit problem.

Using UCT ( UCB1 applied to Trees ) in MC-Go

The UCB1 algorithm simply picks the move aj which maximizes this value UCB1 formula, where Xj is the average of all rewards obtained by selecting action aj, Tj is the number of times action aj was picked, and n is the total number of times any action was picked. It would be space-wise expensive if we were to maintain information about each bandit problem and its values. Therefore, both MoGo and CrazyStone do the Monte Carlo simulation in two phases. In the first phase, called Tree Policy UCB1 is followed, and in the other, a default policy ( using the stochastic player discussed earlier ) is used.

Starting from the board position to be evaluated as the root, MoGo selects a move according to UCB1. It then descends down the tree in a similar manner, until the selected node is not a part of the tree. MoGo then adds this node to the tree. The Tree Policy ends here. From here on the default policy using the stochastic player is used. The video shown illustrates the technique.

Resources for further reading

Most of the content here is from the slides of either Sylvian Gelly or Remi Coulom. Following up on the side-note, there are in fact two fairly strong Go programs built on neural networks – NeuroGo and Steenvreter. Lets see if I could read up on them and blog on it sometime. But soon, there’ll be a post on RAVE and hints on RL-Go.

Advertisements

The Game of Go

Game of Go
Game of Go

It is amazing how simple rules can result in breathtaking complexity. This post is about such another game :

Go

Chances are, you have already seen people play the game. I had seen it first in ‘Fist of Legend‘, Jet Li’s adaptation of the ‘Fist of Fury‘.

It is so old that it was mentioned in the ‘Analects of Confucius‘( 479 B.C ). In fact, the American Go Association cites this is as one of the top ten reasons to play Go.

When you play go, you are doing something that billions of people have done for thousands of years. Many centuries ago, people were doing the exact same thing, in the exact same way.

What an absurd reason to give a promising player to choose the game!!!

Crisply, the rules as specified here:

  • Go-board(Goban):19 x 19 intersections.
  • Back and White play alternatively.
  • Black starts the game.
  • Adjacent stones ( of the same colour ) are called a string.
  • Liberties are the empty intersections next to the string.
  • Stones do not move, they are only added and removed from the board.
  • A string is removed if its number of liberties is 0.
  • Score: Territories(number of occupied or surrounded intersections).

Plain and simple. There is also a large set of terms associated with the game which you will nedd to know if you are playing / coding a player. All of them, Japanese. I strongly agree with the subject of this post, so try to play a few games before you get coding. I would suggest GNUGo, with Quarry to make a good beginning.

However, Computer Science considers a human level Computer Go player an open problem. It belongs to the same class of problems as the ` P = NP ` and ` The Church Turing Hypothesis `.

A comparison with Chess, ought to bring more light on the subject.

  • Computer Chess Programs have defeated Chess Grandmasters, and have become so good, that are now competing among themselves for Chess Supremacy.
  • An average Go program, on the other hand, are routinely trounced by Asian school children.
  • At any stage of the game, Chess, on average, has 37 possible moves, decreasing as the game progresses. ( Lesser coins, lesser moves )
  • Go, on the other hand, averages 250 possible replies per move.
  • There are terms in Go, look here, each of which needs a representation in the program.
  • Chess has very little such terms, en passe, fortress, and the like, and provides lesser challenge to a programmer who has little knowledge of the game.

If you cannot truly assess the complexity of the problem, here’s one more trivia, Go, is associated with the number 10700. This number is almost definitely an overestimate considering that the estimated number of particles in the Universe, when I last checked, was 1091!! However, more reasonable estimates aren’t too far from this number either.

A brief Intro on how computers play ( board ) games:

Computers, play games by searching in a state space tree. Many use a theorem called the Minimax Theorem. Chess, Go and most other board games are said to be complete-information deterministic games in that both players know everything there is to be known from the beginning to the end of the game. There is no role for chance here. In 1928, Von Neumann proved that such games are always zero sum. In non formal terms, a zero-sum game is a game in which one player’s winnings equal the other player’s losses. It is pretty much like cake cutting. So a good move means that the opponent has got only worse moves, all of them in all variants leading to favourable positions which are therefore unfavourable for the opponent.

An example of the minimax algorithm should be very useful here :

Minimax Example

Minimax Example

Consider a two-player game, where each player takes alternate turns ( which is the vast majority of the board games today ). Now as shown in the figure, you have three options, to each which your opponent has three options in reply. The tree’s root indicates that you’ll choose the move that lands you in the best position possible when you are to play next. The numbers in the circle indicates your position after that move. Level 2 is your opponent’s move. Initially, the tree is constructed with empty circles, and evaluation is done from the leaf to the root.

Some evaluation function, say f(x), evaluates the board position and returns the values at the leaf level. You take the minimum of the points your opponent’s move left you and put it in the corresponding higher level. Then, you take the maximum of all the points your move will leave you and put it in the higher level. What has happened? You are left with a guarantee that you’ll get at least +1 no matter what your opponent’s response is. IN simpler terms, you have maximised your minimum profit. And hence the term minimax.

** A minimax tree is generally more than 2 ply( 1 ply is a single player’s turn, turn itself being an ambiguous term, to be used across multiple games ) deep.

Here is more on zero-sum games. A wiki minimax example.

Here is an article that shows why MInimax Theorem is not always practically optimal.

Many optimizations have been made to this approach the prominent of them being, Alpha-Beta pruning.

Naturally, it is important how deep your tree is. The evaluation of the game state is also just as crucial if not more, in the grand scheme of things. A good evaluation function can reduce how deep you need to look.

All of the most famous Chess engines of our time use this technique. For a game like Chess, with an average branching factor of 35~50, specialised hardware with pre-calculated open-game and end-game tables is enough to beat a Chess Grandmaster. In fact it even defeated the highest graded Grandmaster in history.

Such an approach becomes immediately inadequate in the game of Go. No hardware can search in such large a state space. Also, an accurate evaluation function is also very difficult to arrive at. Well, then how do today’s programs approach Go?

A look at the current Computer Go Scenario :

The most powerful computer Go programs in the world today are :

  • GNUGo
  • MoGo
  • CrazyStone
  • Go++
  • NeuroGo

A rough analysis of these five should be of great interest. Updates on this post and further posts will be regarding this.


Bookmark and Share