Computer Go (….continued)

The last post on Computer Go (about a year back) was a brief description of the game, and why it is very difficult to approach Go in the same way we tackled Chess.

In a 4 page article on IEEE spectrum, Feng – Hsiung Hsu – the architect and the principal designer of the IBM Deep Blue – discusses in some detail about how Go could be solved using brute force and variations of Min-Max search itself. The article does not go into great detail about the principal problem with computer Go – the evaluation of board positions. Also, one statement caught my eye:

On top of that, in 10 years Moore’s law is expected to present us with still another 100-fold speedup.

I was under the impression that Moore’s law had already reached its limit. However, by adding more and better processors, supercomputers, will undoubtedly be able to do more operations in parallel. In this article we take a look at the most successful technique that takes advantage of the massive parallel processing power of today’s supercomputers – Monte Carlo Techniques using UCT – and deal the advancements made to it in a later post.

As a side-note – For a while now, I had felt that these approaches are not the best way to tackle Go, and that systems that mimic the behavior of human pattern recognition and learning mechanisms are the way to go ( no pun intended ). There was recently a small discussion in the Go mailing list about the use of Neural Networks in tackling Go. This one remark by Alvaro Begue, has affected strongly my sentiments on the issue:

People tried to make flying machines by imitating birds for a long time. Planes and helicopters fly, but not like birds do it. Similarly, I believe that whenever we figure out how to make machines that play go well, they will not do it like a brain does it.

The point being made here is that the computing systems that we have today are very different from the wirings in our own brains. We do not have very good brain-like pattern recognition system. What we do have are really fast parallel systems that perform simple operations at an amazing speed. And our aim should be to learn to use that effectively.

Monte Carlo Go – MoGo and CrazyStone

The introduction of Monte Carlo methods was a great boost to Computer Go. Currently, the best Computer Go program is MoGo and it uses Monte Carlo evaluation with the UCT algorithm. MC-Go engines explore the game tree upto a certain depth and then use Monte Carlo simulation to evaluate the leaf positions. The biggest issue in computer Go is that there is no way to evaluate the resulting board position after the Min-max tree has been explored to a certain depth. Board evaluation is relatively easier in Chess. Positions and pieces directly give away the value of the board. There is no such simplicity in Go. Territories may change hands. And patterns formed initially may have complex implications across the board.

In Monte Carlo evaluation, multiple simulations are run to evaluate the board position. A stochastic player plays a number of random games against himself starting from this position; until the end, and the average of all the final scores is taken as an estimate of the board’s value. In some cases, the frequency of wins from the position is determined as the position’s value.

Something that we could improve is the stochastic player. Ideally, to effectively evaluate the board position, the stochastic player should model the opponent as closely as possible. This is understandably very difficult to achieve. Instead it is far more practical to make the stochastic player biased towards picking better moves ( as both players are in reality going to draw from a set of good moves ) instead of picking all moves uniformly randomly.

A naive implementation of Monte Carlo simulation based evaluation function, where every leaf is evaluated by running a constant number of simulations, will naturally waste simulations on poor board positions. Instead, its better distribute the available simulations properly to evaluate more promising moves more. This leads us to the Exploration vs Exploitation problem.

Exploration vs Exploitation problem

An intuitive example of this is the n-armed bandit problem. A gambler wants to maximize his winnings from a slot machine with n arms. Each arm has a different mean winning associated with it. Through repeated plays the gambler is to maximize his winnings by concentrating his plays on the best levers, except he doesn’t know which lever is the best. In the absence of this knowledge, the gambler’s choices are to choose between a greedy action (exploitation), where he selects the lever which has given him the maximum reward so far; or to explore and choose a different lever in the hope that it’ll give a greater reward. In MC-Go, during evaluation, every move is treated as different bandit problem.

Using UCT ( UCB1 applied to Trees ) in MC-Go

The UCB1 algorithm simply picks the move aj which maximizes this value UCB1 formula, where Xj is the average of all rewards obtained by selecting action aj, Tj is the number of times action aj was picked, and n is the total number of times any action was picked. It would be space-wise expensive if we were to maintain information about each bandit problem and its values. Therefore, both MoGo and CrazyStone do the Monte Carlo simulation in two phases. In the first phase, called Tree Policy UCB1 is followed, and in the other, a default policy ( using the stochastic player discussed earlier ) is used.

Starting from the board position to be evaluated as the root, MoGo selects a move according to UCB1. It then descends down the tree in a similar manner, until the selected node is not a part of the tree. MoGo then adds this node to the tree. The Tree Policy ends here. From here on the default policy using the stochastic player is used. The video shown illustrates the technique.

Resources for further reading

Most of the content here is from the slides of either Sylvian Gelly or Remi Coulom. Following up on the side-note, there are in fact two fairly strong Go programs built on neural networks – NeuroGo and Steenvreter. Lets see if I could read up on them and blog on it sometime. But soon, there’ll be a post on RAVE and hints on RL-Go.

Game AI Development Event at Kurukshetra 09

College of Engineering, Guindy is organizing a game AI development event this time at  Kurukshetra, the annual International Techno-Management festival.

Kurukshetra 09

Kurukshetra 09

We wanted the event to reflect the current trends in the Game AI development industry. It is left to the participants to be the judge of to what extent the main goal is accomplished. The trends seem to be

  • Player specific content
  • A move away from hard-coded-rules based opponents.

We chose Lunar Lander to be the test game for both problem statements. Why? Lunar Lander was one of the earliest games to use Vector Graphics. But that’s not the reason why. Lunar Lander is simple, to code and to play. And.., it seemed to offer scope for hacking both problem statements. And now to the problem statements.

The First problem statement is a challenge to create a player that plays the Lunar Lander. You could try a rule based approach and try coding it. And you may succeed too. But I’ll bet the easier way out is to make your player learn how to handle the Lander. There are many situations where a rule based approach would fail. Consider the case of a car racing game, a rule based approach may satisfy the initial requirements of the game. Your racer may successfully out-race the human player, but it wouldn’t do well on tracks you, as its programmer would not have seen.  Because your AI doesn’t know how to ‘drive’. Well, that’s the motivation for this problem statement.

The second problem statement is with regards to content creation. Simple games such as Lunar Lander, I think deserve to engage the player for longer than 3 levels. The content of the game is very simple. It involves very little graphics. Is it possible to use modern / not-so-modern techniques to generate an infinite set of levels?? Theoretically, yes. Further, would these levels, ‘engage’ the player as much as hand-crafted levels would? It is definitely worth a try.

Well, if you are a Game AI developer PRO, and all this seems, well, ‘childish’ to you,  or in case you disagree with what we have identified as the current trend in the industry, try the third problem statement. If you have any innovative ideas on game AI, an implementation of your concept is most welcome. Although, we are not intending to accept any text-book based search algorithms. Even slightly modified but better versions are accepted.

This is the link to the event’s page. Please go through the problem statements again. This is the first time we are attempting such an event. Your comments and suggestions are welcomed. They would be most invaluable to making this a more successful event in the next editions of Kurukshetra. Have fun, Merry Christmas and Happy New Year!!!

Game AI

I am afraid I have to push the ‘Go Engines Analysis’ post to a later date.

Commercial and Academic Research in Games

Generally, most research done by academics is of some use commercially. And naturally, one would expect the same to be the case in game AI research also. Only, its not. Commercial games concentrate more on the graphics, and the user experience part of the game, when compared to AI. If there is a bold announcement from the industry stating that the next edition of their game is to have the ‘next-generation-human-like-AI’, they have mostly been huge let-downs, and miserable failures. The strongest AI, in commercial games, these days are those that ‘cheat’. Have you never wondered in-game, at least in the harder levels, ‘When did he get so much points?’,’Does his nitrous never empty?’. And sometimes just plain ‘WTF!!’

Instead, the NPCs ( Non Player Characters ) in the majority of games are controlled by finite state machines, lists of conditional statements, and the occasional A* path finding algorithm. Typically, no learning is present in either development or execution of the AI. Everything is hard-coded by the human game developers.

Academic Research into game AI hasn’t been too rewarding either. So we built the strongest Chess player in history. And we are building better ones. Has that helped us understand how people play the game? How we learn to play the game? How we interpret patterns? Also, Academic Research is mainly focussed on making AI increasingly better at the game, aiming to make it the best. Naturally, that’s not the commercial requirement. Would anyone but Kasprov want to play Deep Blue? What is required commercially is that the player’s opponents should neither be too domineering, nor too easy.

How Games interest us

According to Raph Koster, a game is fun to play because we learn the game as we play; we understand and learn the patterns underlying the game, and finally “grok” how to play it. This requires that the level of challenge always is approximately right, and that new patterns are always available to learn – games that are too simple or impossible to understand are boring.
Another theorist who has tried to nail down the essence of games and why they are fun is Thomas Malone. He claims that the factors that make games fun can be organized into three categories: challenge, fantasy, and curiosity.
The existence of some sort of goal adds to the entertainment value. Further, this goal should not be too hard or too easy to attain, and the player should not be too certain about what level of success he will achieve.
Games that include fantasy, according to Malone, show or evoke images of physical objects or social situations not actually present. The sensation of being somewhere else, being someone else, doing something else.
As for the third factor, curiosity, Malone claims that fun games have an optimal level of informational complexity in that their environments are novel and surprising but not completely incomprehensible. These are games that invite exploration, and keeps the user playing just to see what happens next.

The most interesting applications of real Computational Intelligence and Machine Learning methods

NERO, is a fantastic new concept. The player ‘teaches’ (initially stupid) bots to become ‘trained’ soldiers. In the core of the game is a learning system called NEAT. It involves using Genetic Algorithms to train Artificial Neural Networks( NeuroEvolution ). NEAT involves continually taking ANNs, from an initially random population, and then combining the best among them in the hope that it’ll result in a better ANN.
Another place where learning methods may be put to use is Automatic Content Generation. Here is an intriguing new concept. Customized racing tracks according to a player’s skill. The next Prince of Persia edition claims, custom player experience. Although it is not clear of how ‘custom’ it is to be, this sounds like the beginning of something big. So Automatic Content Generation is definitely IN.
Another instance of Content Generation, in the process of development, is a game called Galactic Arms Race, being developed by Ken Stanley, and his team at UCF, the same dude who pioneered NERO. Remember the classic, Space Invader arcade game? It seems to be something similar except, there is a large and INCREASING variety of guns you can choose from. And each gun fires differently. That is, the graphics is different. And the graphics was not coded by hand, it was EVOLVED by a custom version of NEAT.


Game competitions at GECCO

GECCO 2009, is to conduct two competitions, Salamander and Car Racing. The details of Salamander are not too clear to me, but looking at this and this, it seems, the Salamander is to learn to walk and then catch flies around the house.
The car racing controllers are tested on TORCS. (TORCS, does not have dynamite graphics, but I find it a lot more difficult than NFS, but thats just my perception. It was built mainly for testing AI conntollers). Last year’s entries were quite a variety. There was a controller based on NEAT, and there was also a hand coded controller(Although, I doubt it performed as well as the other trained/evolved controllers, score CI/ML!). The significance of a trained controller is that, provided it is trained sufficiently well on a good track, it acquires the basic driving skills. This would be a big plus for games like Trackmania Nations, which allow the gamer to design his own tracks, but does not give competing cars, leaving the gamer to look for human opponents, to race in his own track. Another aspect of trained units is the versatility. A small change in the training process and the controllers become more aggressive, trying more to push the other players off the track!!

And now..,
The crux of the matter. If you were to host a contest, which game, do you think, considering the expanding scope for computational intelligence in games illustrated by the examples above, would be a good bed for innovative application of CI/ML?
Remember, its a time bound contest. The participant should have enough to time to experiment.