Support NeoGAF

Kinitari · Jun 9, 2016

http://www.wired.co.uk/article/google-ai-montezuma-revenge
https://youtu.be/0yI2wJ6F8r0

Google's Deep Mind has learned how to play yet another game - this time because it had been 'incentivised' to want to win.

"Intrinsic rewards" meant the AI obtained "significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge", wrote Google researchers in a paper.

Intrinsic motivation (IM) algorithms typically use signals to make the AI more 'curious' and are inspired by classic, human-based psychological ideas.

Montezuma's Revenge was a 1984 platform game for the Atari 2600 in which a character navigates a series of complex rooms in an underground Aztec temple.

The model, which had inbuilt rewards, explored 15 rooms out of a potential 24 the old model, which was not incentivised, explored only two.

Really interesting stuff, and these are the sort of improvements that will allow play things like 3 dimensional adventure games, like gta or something. It's stark seeing the difference in how many rooms were explored before and after.

Complete · Jun 9, 2016

It's funny how so many people have no idea how close AI is to reaching human intelligence. Stuff like this is only peanuts compared to what's going to follow over the coming years.

ThoseDeafMutes · Jun 9, 2016

Lol HoI4 just came out and has an achievement with this title and I was like "holy shit google works fast!"

WascallyWabbit · Jun 9, 2016

The researchers now plan to develop a model to tackle even harder games like Starcraft, and to compete against humans in gaming contests.

Good god, they're going after people's livelihoods!

Complete · Jun 9, 2016

WascallyWabbit said:
Good god, they're going after people's livelihoods!

Oh shit, nice catch!

It's only a matter of time before we can't even compete in gaming tourneys anymore.

psynergylover · Jun 9, 2016

One step closer to skynet

Rabbi_Vole · Jun 10, 2016

My grandmother used to call diarrhea Montezuma's revenge.

HStallion · Jun 10, 2016

Just wait til it runs for president!

Bit-Bit · Jun 10, 2016

Ah shit, imagine if you can teach it to play a perfect game of Dark Souls.....

sunshine and gasoline · Jun 10, 2016

HStallion said:
Just wait til it runs for president!

I for one would welcome our benevolent AI rulers.

cpp_is_king · Jun 10, 2016

Am I missing something? This seems dumb, it didn't complete anything, and by the 100th frame it was still pretty terrible at the game

Phreaker · Jun 10, 2016

It's not very good at Montezuma's Revenge in that video.

Kinitari · Jun 10, 2016

Ironically Supporting Fascism Is Cool Bro said:
Am I missing something? This seems dumb, it didn't complete anything, and by the 100th frame it was still pretty terrible at the game

It's hard to explain why this is amazing if you don't appreciate how far AI has come in the last half decade. This is really impressive.

cpp_is_king · Jun 10, 2016

Kinitari said:
It's hard to explain why this is amazing if you don't appreciate how far AI has come in the last half decade. This is really impressive.

I do, but a) the title says it completed the game (it didn't), and b) the article says it completed it in just 4 tries (it dies 20+ times just in the video)

Maybe they've done something impressive, but it certainly isn't illustrated in that video. Did they post the wrong one?

Kinyou · Jun 10, 2016

Recently saw this video of an AI learning Mario

https://youtu.be/qv6UVOQ0F44

I assume what makes the one in the OP more complex is that it actually went and explored rooms that weren't crucial?

Complete · Jun 10, 2016

Kinyou said:
I assume what makes the one in the OP more complex is that it actually went and explored rooms that weren't crucial?

Right, they used the concept of intrinsic rewards to get the program to explore more than was necessary. In other words, exploring was its own reward, and so the AI went further than it would have had it had no incentive to explore.

Which is pretty crazy if you think about it. Here we have an (albiet basic) AI program using the power of intrinsic motivation to guide it into doing things it otherwise wouldn't. Just think about how that could be applied in the future in all sorts of endeavors that are non-essential but worth doing.

Krejlooc · Jun 10, 2016

Calling Montezuma's Revenge an Atari 2600 game is like calling Resident Evil 4 a PS2 game. It was developed originally for the Atari 8-bit. The 2600 version is a port.

WaterAstro · Jun 10, 2016

They should try it against Demon Souls.

adj_noun · Jun 10, 2016

I thought this thread was about a computer contemplating getting the runs.

cpp_is_king · Jun 10, 2016

Complete said:
Right, they used the concept of intrinsic rewards to get the program to explore more than was necessary. In other words, exploring was its own reward, and so the AI went further than it would have had it had no incentive to explore.

Which is pretty crazy if you think about it. Here we have an (albiet basic) AI program using the power of intrinsic motivation to guide it into doing things it otherwise wouldn't. Just think about how that could be applied in the future in all sorts of endeavors that are non-essential but worth doing.

Of course it did, it's practically a tautology. If you've got one system that always turns left, but it turns out that left is very difficult, then it will never get far. If you tell it to go right and left, now all of a sudden it's "getting farther". Or course if exploration is incentivized then the end result will be a more fully explored world. How could you possibly expect any other outcome?

Don't get me wrong, AlphaGo and DeepMind are amazing, but this is not even close to being on the same level

Corronchilejano · Jun 10, 2016

Ironically Supporting Fascism Is Cool Bro said:
AlphaGo and DeepMind are amazing, but this is not even close to being on the same level

Being on the same level as what?

Grinchy · Jun 10, 2016

When it talks about "100 million training frames" does that mean the program has attempted the screens 100 million times?

cpp_is_king · Jun 10, 2016

Corronchilejano said:
Being on the same level as what?

I mean this Montezuma Revenge AI is nowhere near being on the same level as AlphaGo. Anybody with a cursory understanding of AI can (and probably has) written AIs that do this

Corronchilejano · Jun 10, 2016

Ironically Supporting Fascism Is Cool Bro said:
I mean this Montezuma Revenge AI is nowhere near being on the same level as AlphaGo. Anybody with a cursory understanding of AI can (and probably has) written AIs that do this

You have it backwards. AlphaGo is a lot easier to develop than Deep Mind.

Do you have a cursory understanding of AI?

cpp_is_king · Jun 10, 2016

Corronchilejano said:
You have it backwards. AlphaGo is a lot easier to develop than Deep Mind.

Do you have a cursory understanding of AI?

Uhh. Deep Mind is the name of a company. What are you even talking about?

And: yes, to put it lightly

Fusebox · Jun 10, 2016

Complete said:
Oh shit, nice catch!

It's only a matter of time before we can't even compete in gaming tourneys anymore.

Why would advanced AI prevent tournaments? People don't go there to play the single player mode.

Corronchilejano · Jun 10, 2016

Ironically Supporting Fascism Is Cool Bro said:
And: yes, to put it lightly

How lightly?

Ironically Supporting Fascism Is Cool Bro said:
Uhh. Deep Mind is the name of a company. What are you even talking about?

The AlphaGO implementation is easier to develop than the Deep Mind implementation of Montezuma's Revenge. That's what I'm saying.

cpp_is_king · Jun 10, 2016

Corronchilejano said:
How lightly?

The AlphaGO implementation is easier to develop than the Deep Mind implementation of Montezuma's Revenge. That's what I'm saying.

First of all, lmao.

Second of all, I wasn't talking about how hard it is to develop an AI for a non discrete search space, but rather the specific technique of incentivizing an AI to explore. It's trivial and it's used all the time in AI, pervasively even

Corronchilejano · Jun 10, 2016

Ironically Supporting Fascism Is Cool Bro said:
First of all, lmao.

Second of all, I wasn't talking about how hard it is to develop an AI for a non discrete search space, but rather the specific technique of incentivizing an AI to explore. It's trivial and it's used all the time in AI, pervasively even

I'm sorry for asking, I was interested in the subject and wanted to know more. It seems you aren't.

Good day.

Dodecagon · Jun 10, 2016

Complete said:
It's funny how so many people have no idea how close AI is to reaching human intelligence. Stuff like this is only peanuts compared to what's going to follow over the coming years.

To some extent, having a deep appreciation for the current state of the art in AI gives a greater appreciation to how far away AI is to reaching human intelligence. I've seen pop-SCI articles claim the opposite, but to some extent a lot of this material is old and enabled by the ridiculous compute capabilities offered by modern gpus. At the end of the day, a lot of what deep mind has accomplished can be seen as a way of brute forcing solutions ( and more specifically policies ) to optimize complex problem spaces via q-learning.

Kinitari · Jun 10, 2016

Ironically Supporting Fascism Is Cool Bro said:
Uhh. Deep Mind is the name of a company. What are you even talking about?

And: yes, to put it lightly

Deep mind is also the name of their original algorithm. The algorithm they trained on games. AlphaGo was based on that algorithm. This is also based on that algorithm.

Training in intrinsic motivation has been done before, but it's done very well here in combination with what is probably the best general purpose algorithm known.

nOoblet16 · Jun 10, 2016

Corronchilejano said:
You have it backwards. AlphaGo is a lot easier to develop than Deep Mind.

Do you have a cursory understanding of AI?

1) AlphaGo is more complex, Deepmind wrote a neural network that learned to play Atari games in 2015. AlphaGo was built upon that, this one is merely an extension of the former.
2) I am doing a PhD in the exact same field, so similar infact that I had to change my focus to something else because I would have ended up competing with google otherwise which I couldn't possibly have done.

Aselith · Jun 10, 2016

Google is just making next gen twitch streamers. Call me when they perfect the accidentally caught masturbating algorithm.

Spiral Insanity · Jun 10, 2016

This is really scary..

nOoblet16 · Jun 10, 2016

Kinitari said:
Deep mind is also the name of their original algorithm. The algorithm they trained on games. AlphaGo was based on that algorithm. This is also based on that algorithm.

Training in intrinsic motivation has been done before, but it's done very well here in combination with what is probably the best general purpose algorithm known.

The algorithm is called Deep Q Learning, it's basically Reinforcement learning with a deep neural network.

Corronchilejano · Jun 10, 2016

nOoblet16 said:
1) AlphaGo is more complex, Deepmind wrote a neural network that learned to play Atari games in 2015. AlphaGo was built upon that, this one is merely an extension of the former.
2) I am doing a PhD in the exact same field, so similar infact that I had to change my focus to something else because I would have ended up competing with google otherwise which I couldn't possibly have done.

1. When you say it was built upon that, what exactly do you mean?

EDIT: NVM, I see.

Aureon · Jun 10, 2016

Intrinsic motivation is all any AI has, though? Does this mean they're using designer cues (sparks, exploration) that have no gameplay relevance but influence human behavior in the algorithm?
also, AlphaGo IS deepmind.

nOoblet16 · Jun 10, 2016

Corronchilejano said:
1. When you say it was built upon that, what exactly do you mean?

EDIT: NVM, I see.

Basically it is pretty much the same algorithm, the biggest change is that they now have tremendous amount of permutations leading to a humongous search tree. The problem here was finding a way to do a tree search faster.

They did so by implementing a monte carlo search method which was guided by what they call a value network (which provides the estimate of the value of the current state of the game, i.e. probability of winning the game for a player given the current state) and a policy network (provides guidance regarding which action to choose, in the current state of the game), both of which use deep learning.

Dodecagon · Jun 10, 2016

Aureon said:
Intrinsic motivation is all any AI has, though? Does this mean they're using designer cues (sparks, exploration) that have no gameplay relevance but influence human behavior in the algorithm?
also, AlphaGo IS deepmind.

So start with the reward function they attempt optimize against for the 'vanilla' reinforcement learning. Now add a modifier to that equation to not just reward on performance but for exploring something 'new', then learn against that.

Corronchilejano · Jun 10, 2016

nOoblet16 said:
Basically it is pretty much the same algorithm, the biggest change is that they now have tremendous amount of permutations leading to a humongous search tree. The problem here was finding a way to do a tree search faster.

They did so by implementing a monte carlo search method which was guided by what they call a value network (which provides the estimate of the value of the current state of the game, i.e. probability of winning the game for a player given the current state) and a policy network (provides guidance regarding which action to choose, in the current state of the game), both of which use deep learning.

Why is AlphaGo more complex?

EDIT: In fact, I'm confused. They're both the same implementation of the same algorithm. How are they different? What makes alphago more complex?

nOoblet16 · Jun 10, 2016

Aureon said:
Intrinsic motivation is all any AI has, though? Does this mean they're using designer cues (sparks, exploration) that have no gameplay relevance but influence human behavior in the algorithm?
also, AlphaGo IS deepmind.

If I understand your question right you are asking if it uses exploration ?
Exploration vs exploitation is a big thing in the field of deep reinforcement learning...most of the time it's just a random move that the neural network would play to test waters but it's interesting to see it even consider it in the first place.

I don't know anything about the game of Go, but when I attended the presentation by Marc Lanctot he mentioned that AlphaGO came up with a novel move that no one expected and it surprised even Lee Sedol.

nOoblet16 · Jun 10, 2016

Their Starcraft algorithm could be about Deep Reinforcement Learning in a multi agent predator-prey environment where the AI has to learn to compete or/and co-operate. Pretty much zero research exists in this area atm.

Corronchilejano said:
Why is AlphaGo more complex?

EDIT: In fact, I'm confused. They're both the same implementation of the same algorithm. How are they different? What makes alphago more complex?

Because of the huge tree, which is so big because of the sheer number of permutations available in the game of Go.
The algorithm that player Atari didn't have to do this.

Just think for a second, Deepmind made the Atari AI first and then spent the next 8-10 months to make AlphaGo...not the other way around, nor did they do it both simultaneously.

SomedayTheFire · Jun 10, 2016

I love these threads because it's all so interesting and I understand NONE of it.

Timedog · Jun 10, 2016

Ironically Supporting Fascism Is Cool Bro said:
I mean this Montezuma Revenge AI is nowhere near being on the same level as AlphaGo. Anybody with a cursory understanding of AI can (and probably has) written AIs that do this

I'm sure a group of some of the smartest people on earth are implementing the same basic-level stuff that you or I could do for their ground-breaking AI. Because that makes sense.

Dodecagon · Jun 10, 2016

SomedayTheFire said:
I love these threads because it's all so interesting and I understand NONE of it.

For those interested in learning more about machine learning and more topically deep learning, this reference is excellent :

http://www.deeplearningbook.org/

I'd say a basic undergraduate understanding of calculus, statistics, and linear algebra is enough to understand the material in its entirety due to the extensive introduction.

Fusebox · Jun 10, 2016

Timedog said:
I'm sure a group of some of the smartest people on earth are implementing the same basic-level stuff that you or I could do for their ground-breaking AI. Because that makes sense.

Lmao, yep makes sense to me too!

cpp_is_king · Jun 10, 2016

Timedog said:
I'm sure a group of some of the smartest people on earth are implementing the same basic-level stuff that you or I could do for their ground-breaking AI. Because that makes sense.

v

Ironically Supporting Fascism Is Cool Bro said:
Second of all, I wasn't talking about how hard it is to develop an AI for a non discrete search space, but rather the specific technique of incentivizing an AI to explore. It's trivial and it's used all the time in AI, pervasively even

^

Complete · Jun 10, 2016

Fusebox said:
Why would advanced AI prevent tournaments? People don't go there to play the single player mode.

Because sufficiently advanced AI will start getting rights of its own, and at a certain point there will be enough AI around that we can't just say "no, you're not allowed, this tournament is HUMANS ONLY". And then the AI will have well and truly won.

Granted, we're talking fairly far into the future - like at least two decades from now. But it'll happen. (I'm not sure we'll care at that point, however.)

Dodecagon said:
To some extent, having a deep appreciation for the current state of the art in AI gives a greater appreciation to how far away AI is to reaching human intelligence. I've seen pop-SCI articles claim the opposite, but to some extent a lot of this material is old and enabled by the ridiculous compute capabilities offered by modern gpus. At the end of the day, a lot of what deep mind has accomplished can be seen as a way of brute forcing solutions ( and more specifically policies ) to optimize complex problem spaces via q-learning.

Well, that's the thing - the only thing stopping it from going much further is the lack of hardware with which to make more calculations.

The thing that's most interesting to me is the point of vertical growth wherein AI starts taking over materials research and dramatically increases advancements in computing hardware - assuming, of course, we don't hit some kind of hard physical barrier that makes that impossible. I suppose we won't know until we get there, however.

djplaeskool · Jun 10, 2016

Oh wow.
It learned the speed running art of the death warp.
It was killing itself after getting the key to get to the locked door on the first screen faster...

nOoblet16 · Jun 10, 2016

Complete said:
Because sufficiently advanced AI will start getting rights of its own, and at a certain point there will be enough AI around that we can't just say "no, you're not allowed, this tournament is HUMANS ONLY". And then the AI will have well and truly won.

Granted, we're talking fairly far into the future - like at least two decades from now. But it'll happen. (I'm not sure we'll care at that point, however.)

You've been watching too much sci fi.

Support NeoGAF

(Wired) Watch Google's Deep Mind play Montezuma's Revenge

Black Canada Mafia

Banned

Member

Member

Banned

Member

Banned

Now what's the next step in your master plan?

Member

Banned

Member

Member

Black Canada Mafia

Member

Member

Banned

Banned

Member

Member

Member

Member

Banned

Member

Member

Member

Banned

Member

Member

Member

works for a research lab making 6 figures

Black Canada Mafia

Member

Member

Banned

Member

Member

Please do not let me serve on a jury. I am actually a crazy person.

Member

works for a research lab making 6 figures

Member

Member

Member

Member

good credit (by proxy)

works for a research lab making 6 figures

Banned

Member

Banned

Member

Member

Similar threads