Sunday, 22 October 2017

Self-taught AlphaGoZero is the Best GO Player Ever !!

This is a truly amazing development...

On 15 March 2016, I had posted the following blog (click on it to go there)

Another Step towards Achieving Artificial General Intelligence: AlphaGo triumphs over the World Champion of the Game GO

GO is an ancient Chinese board game (458 BC) that is far more complex than Chess.  In March 2016, the program AlphaGo defeated the reigning world champion Lee Se-Dol in 4 out of 5 games.  Lee had remarked:  "I don't know what to say ... I kind of felt powerless".  AlphaGo has since had more notable successes.  
To beat world champions at the game of GO, the computer program AlphaGo had relied largely on supervised learning from millions of human moves - moves in GO that previous human players had used.

David Silver and colleagues have now produced a system called AlphaGoZero, an evolution of AlphaGo, that is the strongest GO player yet.  AlphaGoZero has defeated champion-beating AlphaGo with a score of 100 to 0 - 100% of the time!
The new program AlphaGoZero is based purely on reinforced learning and learns solely from self-play. Starting from completely random play, AlphaGoZero learns to play from scratch, simply by playing games against itself.  
It can reach superhuman level in just a couple of days training involving several million games of self-play.  It can now beat all previous versions of AlphaGo. 

It also learns quickly - It surpassed the version used to beat world GO champion Lee in 3 days of self learning.  
This is superhuman performance indeed!

David Silver explains his work in this youtube video.

The Editor of Nature writes:  Because the program has discovered the same fundamental principles of the game that took humans centuries to conceptualize, the work suggests that such principles have some universal character beyond human bias.

AlphaGoZero represents the evolution of computer programs (artificial intelligence - AI) from ANI (aritificial narrow intelligence) towards AGI (artificial general intelligence)(Click here to see my blog on how we define intelligence).

ANI is a task specific program and it works in a narrow domain with just a few simple rules.  In the game of GO, the domain is a 19 x 19 board (361 spaces) and the rules are very simple.  The program can explore all combinations of that small space by trying out and remembering the outcomes of millions of moves in a day.  This would be impossible for humans to achieve.  The program has far surpassed human capabilities in this regard. But, while excelling in the game of GO, AlphaGoZero is helpless if asked to do anything else - that is where it does not satisfy the AGI test.   ANI - yes; AGI - no.

The self learning aspect of the program is highly significant - this represents a breakthrough from the way computer programs have been developed in the past.  As the task rules and domains expand in scope, some sort of intelligence is bound to materialize from the complexity, and that will be a serious step towards achieving AGI - an intelligence par with humans. 

This will take time; But - remember exponential increase in technological progress.  The future may be a lot closer than we think!

Post a Comment