• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

New version of AlphaGo, trained only by playing itself, beats existing AlphaGo 100-0

Damaniel

Banned
The journal Nature published a paper by DeepMind today, describing their new iteration of AlphaGo, called AlphaGo Zero. Unlike the previous iterations, AlphaGo Zero was trained solely through the use of self-play, only being given the basic rules of the game. After only 3 days of training, Zero could easily defeat the version of AlphaGo that beat Lee Sedol - it won 100-0 in a series of test matches, and after 40 days beat their previous best version (called AlphaGo Master) 89-11. By their estimate, AlphaGo Zero has an ELO rating of over 5000, which is absolutely insane levels of gameplay (for reference, Lee Sedol has an estimated ELO of around 3500, and Ke Jie has an ELO of around 3660).

Here's a couple articles on the research:
- Gizmodo - ignore the clickbaity headline.
- DeepMind's own article on the research. They have a link to the actual paper, which is pretty interesting.

Replace me with a vastly superior AI if old.
 

Kinitari

Black Canada Mafia
Right now the Baduk community is frantically looking at all the shared games and it seems already gleaning new insights. It's kind of insane
 

Laiza

Member
So the Go AI that was unbeatable by humans got absolutely trashed by a superior version of that Go AI that is now unbeatable by its former self.

Man, I love seeing shit like this. Really makes me feel like we're living in the future!
 

Kinitari

Black Canada Mafia
The OpenAI dota 1v1 program has slowly shown cracks and pro players are finding ways to beat it (half a year of data gathering)

http://neogaf.com/forum/showthread.php?p=245965344

The 2018 edition should be insanity

I realized after reading more about this that it was a bit represented, I don't know if by Open AI itself or the media. I think it's impressive, but they made it feel less impressive as soon as you knew more about what it really did

http://www.wildml.com/2017/08/hype-or-not-some-perspective-on-openais-dota-2-bot/
 

Skittles

Member
I would rather a computer be in charge of the nukes than Donald Trump.
292990-harlan-ellison-i-have-no-mouth-and-i-must-scream-linux-soundtrack.jpg

Man, I give it 2 years before most video games are conquered. Which means computers will be better at us in any game lol
 

FiggyCal

Banned
I realized after reading more about this that it was a bit represented, I don't know if by Open AI itself or the media. I think it's impressive, but they made it feel less impressive as soon as you knew more about what it really did

http://www.wildml.com/2017/08/hype-or-not-some-perspective-on-openais-dota-2-bot/

Trained entirely through self-play

Hard-coded restrictions: The bot was not trained from scratch knowing nothing about the game. Item choices were hardcoded, and so were certain techniques, such as creep block, that were deemed necessary to win.

I would say so. The article even contradicted itself.
 

Steiner84

All 26 hours. Multiple times.
aynone knows about how good the Dota bot got in the meantime since he stripped every pro player naked?

edit: nvm, didnt see the post.
 

Eridani

Member
Why? Chess robots have been beating human players for decades.

Because go is a much more complicated game - that is, it has a much larger state space, which makes the algorithms used for chess completely unusable, no matter how much the hardware improves. The algorithms for go use a completely different field of AI than the ones used for chess - one that has advanced astronomically in the last few years.
 

Kinitari

Black Canada Mafia
I would say so. The article even contradicted itself.

No, it's just a confusing way to describe training vs underlying algorithims. As in, training using a dataset would involve it watching games first and then using that information to inform it's own self play (or at least, this is how older versions of alphaGo operated). This didn't 'watch' any games prior to it's self play, however the final build had some hard coded commands - which makes it less impressive.

The author doesn't really make that distinction clear here - basically they're trying to say 'it's nice that it didn't need a giant dataset, but... It sucks that it needed hand coded rules'
 

RSP

Member
So what other tasks can this thing be taught?

I suppose it's things where fixed rules apply and need to be considered when doing the same task a lot of times in succession?
 

Eridani

Member
So my thread, which got locked for being a duplicate, and this earlier one, are both a page long?

GAF you need to pay more attention!

It's just not something that's very interesting for most people. "AlphaGo went from being unbeatable to being slightly more unbeatable" isn't really all that impressive to most people I'd imagine. This is actually a lot more impressive then that, since they used a very different approach than the old AlphaGo, and even though I only skimmed the paper their work seems incredibly impressive, but if you're not actually interested in AI research it just doesn't look like much of a big deal.
 

nynt9

Member
So what other tasks can this thing be taught?

I suppose it's things where fixed rules apply and need to be considered when doing the same task a lot of times in succession?

Deepmind, the company behind this (owned by Google) have been looking into healthcare applications. The technology behind this is very generalizable.
 

efyu_lemonardo

May I have a cookie?
So what other tasks can this thing be taught?

I suppose it's things where fixed rules apply and need to be considered when doing the same task a lot of times in succession?

The class of tasks this result applies to is called Perfect Information Games. As in nothing is hidden from either player.
Unlike, say, a game of Starcraft, which Deep Mind are also working on cracking, but is more complicated.
 

efyu_lemonardo

May I have a cookie?
It's just not something that's very interesting for most people. "AlphaGo went from being unbeatable to being slightly more unbeatable" isn't really all that impressive to most people I'd imagine. This is actually a lot more impressive then that, since they used a very different approach than the old AlphaGo, and even though I only skimmed the paper their work seems incredibly impressive, but if you're not actually interested in AI research it just doesn't look like much of a big deal.

If there's one thing any layman should take from here, it's the potentially alarming rate of improvement in these systems.

Quite possibly there will come a day in the near future where the time between a particular problem being solvable by A.I. at a level that doesn't suck when compared to an average human, and an A.I. becoming far better than any human could hope to be at solving that same problem, will be measured in mere days!

That's not something you can take a "wait and see" approach with any more. People need to understand.
 

Daedardus

Member
We're all gonna die.

This is just an AI that has been optimised and trained to be very good at Go by human scientists and engineers. The idea that a specific AI will be a good at everything is a bit ridiculous. For example, this AI can't tell you which stocks to invest or can drive your car, even though there are AIs that can do that sort of stuff. Still, all these AI have to go through a teaching process, which is limited by the amount of time and processing power you have available. AI won't try to learn things they don't themselves because it would waste their effort.

The reason why Elon Musk et.al. are afraid of AI is not for this AI that can play chess or Go, but AI that has been specifically engineered to detect humans and eliminate them, in a war context by drones. They fear that the AI may be too good at the stuff it was made for, and therefore can get difficult to control. But it's still a computer that had to be thought how to kill, and not a computer teaching itself to kill without orders.
 

Eridani

Member
If there's one thing any layman should take from here, it's the potentially alarming rate of improvement in these systems.

Quite possibly there will come a day in the near future where the time between a particular problem being solvable by A.I. at a level that doesn't suck when compared to an average human, and an A.I. becoming far better than any human could hope to be at solving that same problem, will be measured in mere days!

That's not something you can take a "wait and see" approach with any more. People need to understand.

The thing is that AlphaGo was already far, far better at Go then any player could possibly be. That got a lot of interest, and rightfully so. AlphaGo zero improved the ELO rating from 4858 to 5185, which doesn't really look like much of an improvement. Previous versions of AlphaGo also achieved similar improvements and did not raise much news.

The real news here is the novel approach. Using reinforcement learning, with no human knowledge to achieve such a result is frankly mind blowing. This closing quote from the paper really puts in perspective:

Humankind has accumulated Go knowledge from millions of games played over thousands of years, collectively distilled into patterns, proverbs and books. In the space of a few days, starting tabula rasa, AlphaGo Zero was able to rediscover much of this Go knowledge, as well as novel strategies that provide new insights into the oldest of games

This is the biggest thing here, and is something older versions of AlphaGo weren't really capable of doing, since they relied on a ton of human knowledge accumulated through hundreds of years to achieve what it did. AlphaGo Zero started from nothing and beat all that knowledge in 40 days.
 

efyu_lemonardo

May I have a cookie?
The thing is that AlphaGo was already far, far better at Go then any player could possibly be. That got a lot of interest, and rightfully so. AlphaGo zero improved the ELO rating from 4858 to 5185, which doesn't really look like much of an improvement. Previous versions of AlphaGo also achieved similar improvements and did not raise much news.

The real news here is the novel approach. Using reinforcement learning, with no human knowledge to achieve such a result is frankly mind blowing. This closing quote from the paper really puts in perspective:



This is the biggest thing here, and is something older versions of AlphaGo weren't really capable of doing, since they relied on a ton of human knowledge accumulated through hundreds of years to achieve what it did. AlphaGo Zero started from nothing and beat all that knowledge in 40 days.

Bingo. And that's what people need to understand.
 

Eridani

Member
Bingo. And that's what people need to understand.
It would help if the reporting didn't focus so much on the 100-0 number, which is quite irrelevant. It's also misleading, since the actual number against the latest previous version was 89-11. It should focus on the mind blowing way this was achieved.
 

thetrin

Hail, peons, for I have come as ambassador from the great and bountiful Blueberry Butt Explosion
I guess I'll skip the T2 reference and make an MGS4 war economy reference instead.
 

Laughing Banana

Weeping Pickle
I would rather a computer be in charge of the nukes than Donald Trump.

Ehh hhhhh.... No? You sure you want to entrust the world's most powerful arsenal of weapon to a cold, unfeeling, inhumane, 100% logical machine? That's scary as hell.

I mean, what if it comes to the 'logical' conclusion that earth has been filled with too many humans already and decided to wipe out a large amount of us just because?
 

efyu_lemonardo

May I have a cookie?
Computer programs don't have to become "self aware" or "evil" to be a major threat to us. It's enough that they become extremely competent at managing certain human affairs, to the point where we place a large amount of our trust in them, and then one day due to a bug they make a mistake. That could potentially be enough to wipe out a whole lot of us.

On the plus side, we humans make mistakes that result in countless unnecessary deaths all the time.
 

tokkun

Member
Computer programs don't have to become "self aware" or "evil" to be a major threat to us. It's enough that they become extremely competent at managing certain human affairs, to the point where we place a large amount of our trust in them, and then one day due to a bug they make a mistake. That could potentially be enough to wipe out a whole lot of us.

On the plus side, we humans make mistakes that result in countless unnecessary deaths all the time.

I feel like we made this transition long ago. If you look at a nuclear reactor, it is a computer that is managing the reaction process. Our communications systems are all computerized, computers do most of the flying of planes, they do most of the volume of stock market trades.

The solution we have taken is to vet the systems as carefully as possible and have humans there to override them if we see evidence of a bug.
 
it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.

. . .

Welp. Let me just say that I for one welcome our new robot overlords. I can be quite useful in rounding up humans, or "biological batteries."
 
Top Bottom