The last post on Computer Go (about a year back) was a brief description of the game, and why it is very difficult to approach Go in the same way we tackled Chess.
In a 4 page article on IEEE spectrum, Feng – Hsiung Hsu – the architect and the principal designer of the IBM Deep Blue – discusses in some detail about how Go could be solved using brute force and variations of Min-Max search itself. The article does not go into great detail about the principal problem with computer Go – the evaluation of board positions. Also, one statement caught my eye:
On top of that, in 10 years Moore’s law is expected to present us with still another 100-fold speedup.
I was under the impression that Moore’s law had already reached its limit. However, by adding more and better processors, supercomputers, will undoubtedly be able to do more operations in parallel. In this article we take a look at the most successful technique that takes advantage of the massive parallel processing power of today’s supercomputers – Monte Carlo Techniques using UCT – and deal the advancements made to it in a later post.
As a side-note – For a while now, I had felt that these approaches are not the best way to tackle Go, and that systems that mimic the behavior of human pattern recognition and learning mechanisms are the way to go ( no pun intended ). There was recently a small discussion in the Go mailing list about the use of Neural Networks in tackling Go. This one remark by Alvaro Begue, has affected strongly my sentiments on the issue:
People tried to make flying machines by imitating birds for a long time. Planes and helicopters fly, but not like birds do it. Similarly, I believe that whenever we figure out how to make machines that play go well, they will not do it like a brain does it.
The point being made here is that the computing systems that we have today are very different from the wirings in our own brains. We do not have very good brain-like pattern recognition system. What we do have are really fast parallel systems that perform simple operations at an amazing speed. And our aim should be to learn to use that effectively.
Monte Carlo Go – MoGo and CrazyStone
The introduction of Monte Carlo methods was a great boost to Computer Go. Currently, the best Computer Go program is MoGo and it uses Monte Carlo evaluation with the UCT algorithm. MC-Go engines explore the game tree upto a certain depth and then use Monte Carlo simulation to evaluate the leaf positions. The biggest issue in computer Go is that there is no way to evaluate the resulting board position after the Min-max tree has been explored to a certain depth. Board evaluation is relatively easier in Chess. Positions and pieces directly give away the value of the board. There is no such simplicity in Go. Territories may change hands. And patterns formed initially may have complex implications across the board.
In Monte Carlo evaluation, multiple simulations are run to evaluate the board position. A stochastic player plays a number of random games against himself starting from this position; until the end, and the average of all the final scores is taken as an estimate of the board’s value. In some cases, the frequency of wins from the position is determined as the position’s value.
Something that we could improve is the stochastic player. Ideally, to effectively evaluate the board position, the stochastic player should model the opponent as closely as possible. This is understandably very difficult to achieve. Instead it is far more practical to make the stochastic player biased towards picking better moves ( as both players are in reality going to draw from a set of good moves ) instead of picking all moves uniformly randomly.
A naive implementation of Monte Carlo simulation based evaluation function, where every leaf is evaluated by running a constant number of simulations, will naturally waste simulations on poor board positions. Instead, its better distribute the available simulations properly to evaluate more promising moves more. This leads us to the Exploration vs Exploitation problem.
Exploration vs Exploitation problem
An intuitive example of this is the n-armed bandit problem. A gambler wants to maximize his winnings from a slot machine with n arms. Each arm has a different mean winning associated with it. Through repeated plays the gambler is to maximize his winnings by concentrating his plays on the best levers, except he doesn’t know which lever is the best. In the absence of this knowledge, the gambler’s choices are to choose between a greedy action (exploitation), where he selects the lever which has given him the maximum reward so far; or to explore and choose a different lever in the hope that it’ll give a greater reward. In MC-Go, during evaluation, every move is treated as different bandit problem.
Using UCT ( UCB1 applied to Trees ) in MC-Go
The UCB1 algorithm simply picks the move aj which maximizes this value , where Xj is the average of all rewards obtained by selecting action aj, Tj is the number of times action aj was picked, and n is the total number of times any action was picked. It would be space-wise expensive if we were to maintain information about each bandit problem and its values. Therefore, both MoGo and CrazyStone do the Monte Carlo simulation in two phases. In the first phase, called Tree Policy UCB1 is followed, and in the other, a default policy ( using the stochastic player discussed earlier ) is used.
Starting from the board position to be evaluated as the root, MoGo selects a move according to UCB1. It then descends down the tree in a similar manner, until the selected node is not a part of the tree. MoGo then adds this node to the tree. The Tree Policy ends here. From here on the default policy using the stochastic player is used. The video shown illustrates the technique.
Resources for further reading
- Homepage of Sylvian Gelly (no longer maintaining MoGo) and MoGo
- CrazyStone and Remi Coulom
- University of Alberta – A lot of Go related people here. David Silver (RLGO, RAVE) , Csaba Szepesvári (UCB) and M. Enzenberger( NeuroGo )
Most of the content here is from the slides of either Sylvian Gelly or Remi Coulom. Following up on the side-note, there are in fact two fairly strong Go programs built on neural networks – NeuroGo and Steenvreter. Lets see if I could read up on them and blog on it sometime. But soon, there’ll be a post on RAVE and hints on RL-Go.
It is Charles Darwin’s 200th birthday. So I guess there must be loads of posts on GA, strewn across the Internet. Genetic Algorithms are special to me because they are the prime reason why I am doing whatever it is that I am doing.
This post is a light intro to Genetic Algorithms, sans its terms. Genetic Algorithms are delightfully simple. GA is an approximation algorithm, that is, they are used in situations where either the optimal solution is unnecessary or the complexity of the problem does not permit it. Sometimes, the Genetic Algorithm may produce the most optimal result also. Since examples are the best way to learn an algorithm, here’s the example problem we are to solve.
Given to you is a configuration of balances as shown in the figure. The problem is to balance the configuration by adding weights on either side. You are provided with 10 weights measuring 1,2,3……10 kg each. Each weight is to be used exactly once. Thanks to GreatBG for this puzzle. His puzzles are always most entertaining. This can roughly be equated to the traveling salesperson problem. In the problem, the salesperson is to visit all cities (and comeback ), in such a sequence so as to minimize the total distance covered. The most naive solution is to try all possible sequences and see which one gives the minimum cost. For a map of n cities, that comes to n! combinations. A similar approach would work in our case also, but then, when the map or the number of weights is larger, the technique, takes so much more time. A more professional approach would be ruffle up your knowledge of elementary physics and reduce the problem to a set of linear equations and then solve them. Naaa.
- Choose initial population
- Evaluate the fitness of each individual in the population
- Repeat until termination: (time limit or sufficient fitness achieved)
- Select best-ranking individuals to reproduce
- Breed new generation through crossover and/or mutation (genetic operations) and give birth to offspring
- Evaluate the individual fitnesses of the offspring
- Replace worst ranked part of population with offspring
Initializing the Population
The algorithm maintains a set of solutions. Not optimal solutions, just solutions. For example, in the case of the TSP, it may be a set of random ordering of the cities. In our problem, a set of random configuration of weights. In the program, I maintain the configuration as a string, where the weight alloted to each slot is represented by a character, with ‘A’ being 1, ‘B’ being 2 and so on. The weights are added bottom to top and left to right. One point to be noted here is that all solutions in the set must stick to the constraints of the problem. No city must be re-visited, no weight can be added twice.
Evaluating the Population
This is the key part of the algorithm. Once a way is found to evaluate each member of the ‘population’, the problem is half complete already. In our case, the “fitness” will be based on the resultant moment in the system. Therefore, best solution will have 0 fitness, the resultant moment in the system would be zero.
Until the required precision is obtained, or a maximum limit on the number of iterations is reached, we continue by doing the following opertions.
This is where we actually begin to understand why it is called Genetic Algorithms in the first place. According to Darwin’s thery of Natural Selection, the best members of the population mate to produce better offsprings. Similarly, in our algorithms, the best solutions found so far mate with each other to (hopefully) produce a better solution. In code, this translates as the merging of substrings from different members, chosen from among the best solutions found so far. Point to be noted is that if done improperly, this could result in an offspring, that is not even a valid member of the population. It might violate the problems’ constraints. In TSP, and in this case, the same city/weight might get listed again, which cannot be allowed. So we use a special crossover method called, Partially Matched Crossover(PMX in code) which is beyond the scope of a primer.
Some members of the population are randomly chosen to be mutated. Mutation is a process by which the offsprings get a trait that is inherited neither from the father nor the mother.While translating into code, this simply means swapping two characters within the string. You can choose more complex mutation techniques if desired. Again, the problem constraints must be satisfied. Mutation is required mainly to increase the variety of solutions in the population. Variety is very important.
Natural Selection and Elitism
Once these operations are done on the parents, a new generation of solutions are born. To make sure that the best solution in this generation is atleast as good as the best one in the previous generation, we introduce “elitism”. Normally, all the individuals in the previous generation are replaced by the members from the newer generation. But by having a non zero elitism value, a small portion of the best solutions from the previous generation survive on in the next generation.
Going by the paradigm of survival of the fittest, we now select the best solutions from the both the generations and these form the actual next generation. (My code is slightly varied. It does not use natural selection, the members of the younger generation simply replace all but the elite members of the elder population)
Genetic Algorithms suffer from the problem that it might converge to a non-optimal solution. Consider that we are looking for the solution that is at the top of a mountain. It is possible that the population end up on an elevated hill, far away form the mountain. This is possible if the various values are not tuned properly. The algorithm will not get out of this situation by itself. the best way to go in this kind of a scenario is to simply start again.
People, Blogs, Conferences and Applications
- Illinois Genetic Algorithm Laboratory (Home of David Goldberg, the father of Genetic Algorithms )
- Genetic Argonaut ( Watch this blog for the latest updates in the field )
- GA and ART, evolving Mona-Lisa
- NEAT users group. (Highly active Yahoo! group, NEAT , that uses GA to train ANNs )
- Evolving Gaits
- GECCO ( the most impactful conference in all of computer Science )
- A formal and free treatment of Genetic Algorithms (ebook)
- Sanjeev (me), my poster on Dynamic Difficulty Adaptation in Games using GA has been accepted at ICFDG 09. Watch this blog for updates on that.
I am afraid I have to push the ‘Go Engines Analysis’ post to a later date.
Commercial and Academic Research in Games
Generally, most research done by academics is of some use commercially. And naturally, one would expect the same to be the case in game AI research also. Only, its not. Commercial games concentrate more on the graphics, and the user experience part of the game, when compared to AI. If there is a bold announcement from the industry stating that the next edition of their game is to have the ‘next-generation-human-like-AI’, they have mostly been huge let-downs, and miserable failures. The strongest AI, in commercial games, these days are those that ‘cheat’. Have you never wondered in-game, at least in the harder levels, ‘When did he get so much points?’,’Does his nitrous never empty?’. And sometimes just plain ‘WTF!!’
Instead, the NPCs ( Non Player Characters ) in the majority of games are controlled by finite state machines, lists of conditional statements, and the occasional A* path finding algorithm. Typically, no learning is present in either development or execution of the AI. Everything is hard-coded by the human game developers.
Academic Research into game AI hasn’t been too rewarding either. So we built the strongest Chess player in history. And we are building better ones. Has that helped us understand how people play the game? How we learn to play the game? How we interpret patterns? Also, Academic Research is mainly focussed on making AI increasingly better at the game, aiming to make it the best. Naturally, that’s not the commercial requirement. Would anyone but Kasprov want to play Deep Blue? What is required commercially is that the player’s opponents should neither be too domineering, nor too easy.
How Games interest us
According to Raph Koster, a game is fun to play because we learn the game as we play; we understand and learn the patterns underlying the game, and finally “grok” how to play it. This requires that the level of challenge always is approximately right, and that new patterns are always available to learn – games that are too simple or impossible to understand are boring.
Another theorist who has tried to nail down the essence of games and why they are fun is Thomas Malone. He claims that the factors that make games fun can be organized into three categories: challenge, fantasy, and curiosity.
The existence of some sort of goal adds to the entertainment value. Further, this goal should not be too hard or too easy to attain, and the player should not be too certain about what level of success he will achieve.
Games that include fantasy, according to Malone, show or evoke images of physical objects or social situations not actually present. The sensation of being somewhere else, being someone else, doing something else.
As for the third factor, curiosity, Malone claims that fun games have an optimal level of informational complexity in that their environments are novel and surprising but not completely incomprehensible. These are games that invite exploration, and keeps the user playing just to see what happens next.
The most interesting applications of real Computational Intelligence and Machine Learning methods
NERO, is a fantastic new concept. The player ‘teaches’ (initially stupid) bots to become ‘trained’ soldiers. In the core of the game is a learning system called NEAT. It involves using Genetic Algorithms to train Artificial Neural Networks( NeuroEvolution ). NEAT involves continually taking ANNs, from an initially random population, and then combining the best among them in the hope that it’ll result in a better ANN.
Another place where learning methods may be put to use is Automatic Content Generation. Here is an intriguing new concept. Customized racing tracks according to a player’s skill. The next Prince of Persia edition claims, custom player experience. Although it is not clear of how ‘custom’ it is to be, this sounds like the beginning of something big. So Automatic Content Generation is definitely IN.
Another instance of Content Generation, in the process of development, is a game called Galactic Arms Race, being developed by Ken Stanley, and his team at UCF, the same dude who pioneered NERO. Remember the classic, Space Invader arcade game? It seems to be something similar except, there is a large and INCREASING variety of guns you can choose from. And each gun fires differently. That is, the graphics is different. And the graphics was not coded by hand, it was EVOLVED by a custom version of NEAT.
GECCO 2009, is to conduct two competitions, Salamander and Car Racing. The details of Salamander are not too clear to me, but looking at this and this, it seems, the Salamander is to learn to walk and then catch flies around the house.
The car racing controllers are tested on TORCS. (TORCS, does not have dynamite graphics, but I find it a lot more difficult than NFS, but thats just my perception. It was built mainly for testing AI conntollers). Last year’s entries were quite a variety. There was a controller based on NEAT, and there was also a hand coded controller(Although, I doubt it performed as well as the other trained/evolved controllers, score CI/ML!). The significance of a trained controller is that, provided it is trained sufficiently well on a good track, it acquires the basic driving skills. This would be a big plus for games like Trackmania Nations, which allow the gamer to design his own tracks, but does not give competing cars, leaving the gamer to look for human opponents, to race in his own track. Another aspect of trained units is the versatility. A small change in the training process and the controllers become more aggressive, trying more to push the other players off the track!!
The crux of the matter. If you were to host a contest, which game, do you think, considering the expanding scope for computational intelligence in games illustrated by the examples above, would be a good bed for innovative application of CI/ML?
Remember, its a time bound contest. The participant should have enough to time to experiment.
Please do leave a token on the page that you are interested. Perhaps a comment? Please do post comments, whatever they may be.
Find the number of set bits in the binary equivalent of a decimal number. The solution is required to be O(1) in both time and space. here is a slick solution. Although it is claimed here that it is nearly O(1), I’m not sure the judges would have accepted it.
For any unsigned number b, n is the number of signed bits, and n is calculated as follows:
for(n=0; b; b >>= 1)
if (b & 1) n++;
Un more. From an infinte bit stream, determine whether the decimal equivalent of the binary obtained so far is divisible by three.
Sum the even and odd bits separately. If their difference is divisible by three, so is the number. You can obtain the result by considering the number is represented as Summatuion over(3-1)^n instead of the usual Sum over 2^n, and then expanding using binomial expansion.
Too bad, this didn’t strike me during the event. More problems will be posted. You are free to request or even send interesting Qs to me.
Kurukshetera 08! is come and gone. Our team was one of the three selected for the finals of Project K. I guess, our project on Number plate recognition was impressive. The team Root(-1), originally the team name of Lenin, now our team, took third place in iCode. But it is not of the spoils of war that has forced me to write a blog article. (duh,, I haven’t written one in ages) It was the K!OSPC, the OnSite programming Contest that is prompting to write these lines.
We messed up!!! Thats the short of it. Every team that got selected in iCode made it through to round 2 of OSPC. All that is, expect ours. Questions were totally the kind you’d expect in a tech interview. And we were like “Dude!!!!????” And I had to do a hell lot of searching just to get to the answer to one of them. The truth is that forums are great. But if all you want is one answer straight away, forums are a waste of time. People just keep dragging the pages, sometimes getting nowhere. So that’s why I’ve decided, I’ll keep a collection of the most interesting questing questions i meet with and put up its solution once I find it. Only the solution.
As it is in most cases, answers are open to contention. But the idea is to keep it is as far away from a forum as possible. Just a ready reference. Thats all.