• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

(Wired) Watch Google's Deep Mind play Montezuma's Revenge

Status
Not open for further replies.

Kinitari

Black Canada Mafia
http://www.wired.co.uk/article/google-ai-montezuma-revenge
https://youtu.be/0yI2wJ6F8r0
Google's Deep Mind has learned how to play yet another game - this time because it had been 'incentivised' to want to win.

"Intrinsic rewards" meant the AI obtained "significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge", wrote Google researchers in a paper.

Intrinsic motivation (IM) algorithms typically use signals to make the AI more 'curious' and are inspired by classic, human-based psychological ideas.

Montezuma's Revenge was a 1984 platform game for the Atari 2600 in which a character navigates a series of complex rooms in an underground Aztec temple.

The model, which had inbuilt rewards, explored 15 rooms out of a potential 24 – the old model, which was not incentivised, explored only two.

Really interesting stuff, and these are the sort of improvements that will allow play things like 3 dimensional adventure games, like gta or something. It's stark seeing the difference in how many rooms were explored before and after.
 

Complete

Banned
It's funny how so many people have no idea how close AI is to reaching human intelligence. Stuff like this is only peanuts compared to what's going to follow over the coming years.
 
Am I missing something? This seems dumb, it didn't complete anything, and by the 100th frame it was still pretty terrible at the game
 
It's hard to explain why this is amazing if you don't appreciate how far AI has come in the last half decade. This is really impressive.

I do, but a) the title says it completed the game (it didn't), and b) the article says it completed it in just 4 tries (it dies 20+ times just in the video)

Maybe they've done something impressive, but it certainly isn't illustrated in that video. Did they post the wrong one?
 

Complete

Banned
I assume what makes the one in the OP more complex is that it actually went and explored rooms that weren't crucial?
Right, they used the concept of intrinsic rewards to get the program to explore more than was necessary. In other words, exploring was its own reward, and so the AI went further than it would have had it had no incentive to explore.

Which is pretty crazy if you think about it. Here we have an (albiet basic) AI program using the power of intrinsic motivation to guide it into doing things it otherwise wouldn't. Just think about how that could be applied in the future in all sorts of endeavors that are non-essential but worth doing.
 

Krejlooc

Banned
Calling Montezuma's Revenge an Atari 2600 game is like calling Resident Evil 4 a PS2 game. It was developed originally for the Atari 8-bit. The 2600 version is a port.
 
Right, they used the concept of intrinsic rewards to get the program to explore more than was necessary. In other words, exploring was its own reward, and so the AI went further than it would have had it had no incentive to explore.

Which is pretty crazy if you think about it. Here we have an (albiet basic) AI program using the power of intrinsic motivation to guide it into doing things it otherwise wouldn't. Just think about how that could be applied in the future in all sorts of endeavors that are non-essential but worth doing.

Of course it did, it's practically a tautology. If you've got one system that always turns left, but it turns out that left is very difficult, then it will never get far. If you tell it to go right and left, now all of a sudden it's "getting farther". Or course if exploration is incentivized then the end result will be a more fully explored world. How could you possibly expect any other outcome?

Don't get me wrong, AlphaGo and DeepMind are amazing, but this is not even close to being on the same level
 

Grinchy

Banned
When it talks about "100 million training frames" does that mean the program has attempted the screens 100 million times?
 
How lightly?


The AlphaGO implementation is easier to develop than the Deep Mind implementation of Montezuma's Revenge. That's what I'm saying.

First of all, lmao.

Second of all, I wasn't talking about how hard it is to develop an AI for a non discrete search space, but rather the specific technique of incentivizing an AI to explore. It's trivial and it's used all the time in AI, pervasively even
 

Dodecagon

works for a research lab making 6 figures
It's funny how so many people have no idea how close AI is to reaching human intelligence. Stuff like this is only peanuts compared to what's going to follow over the coming years.

To some extent, having a deep appreciation for the current state of the art in AI gives a greater appreciation to how far away AI is to reaching human intelligence. I've seen pop-SCI articles claim the opposite, but to some extent a lot of this material is old and enabled by the ridiculous compute capabilities offered by modern gpus. At the end of the day, a lot of what deep mind has accomplished can be seen as a way of brute forcing solutions ( and more specifically policies ) to optimize complex problem spaces via q-learning.
 

Kinitari

Black Canada Mafia
Uhh. Deep Mind is the name of a company. What are you even talking about?

And: yes, to put it lightly

Deep mind is also the name of their original algorithm. The algorithm they trained on games. AlphaGo was based on that algorithm. This is also based on that algorithm.

Training in intrinsic motivation has been done before, but it's done very well here in combination with what is probably the best general purpose algorithm known.
 

nOoblet16

Member
You have it backwards. AlphaGo is a lot easier to develop than Deep Mind.

Do you have a cursory understanding of AI?

1) AlphaGo is more complex, Deepmind wrote a neural network that learned to play Atari games in 2015. AlphaGo was built upon that, this one is merely an extension of the former.
2) I am doing a PhD in the exact same field, so similar infact that I had to change my focus to something else because I would have ended up competing with google otherwise which I couldn't possibly have done.
 

Aselith

Member
Google is just making next gen twitch streamers. Call me when they perfect the accidentally caught masturbating algorithm.
 

nOoblet16

Member
Deep mind is also the name of their original algorithm. The algorithm they trained on games. AlphaGo was based on that algorithm. This is also based on that algorithm.

Training in intrinsic motivation has been done before, but it's done very well here in combination with what is probably the best general purpose algorithm known.

The algorithm is called Deep Q Learning, it's basically Reinforcement learning with a deep neural network.
 
1) AlphaGo is more complex, Deepmind wrote a neural network that learned to play Atari games in 2015. AlphaGo was built upon that, this one is merely an extension of the former.
2) I am doing a PhD in the exact same field, so similar infact that I had to change my focus to something else because I would have ended up competing with google otherwise which I couldn't possibly have done.
1. When you say it was built upon that, what exactly do you mean?

EDIT: NVM, I see.
 

Aureon

Please do not let me serve on a jury. I am actually a crazy person.
Intrinsic motivation is all any AI has, though? Does this mean they're using designer cues (sparks, exploration) that have no gameplay relevance but influence human behavior in the algorithm?
also, AlphaGo IS deepmind.
 

nOoblet16

Member
1. When you say it was built upon that, what exactly do you mean?

EDIT: NVM, I see.

Basically it is pretty much the same algorithm, the biggest change is that they now have tremendous amount of permutations leading to a humongous search tree. The problem here was finding a way to do a tree search faster.

They did so by implementing a monte carlo search method which was guided by what they call a value network (which provides the estimate of the value of the current state of the game, i.e. probability of winning the game for a player given the current state) and a policy network (provides guidance regarding which action to choose, in the current state of the game), both of which use deep learning.
 

Dodecagon

works for a research lab making 6 figures
Intrinsic motivation is all any AI has, though? Does this mean they're using designer cues (sparks, exploration) that have no gameplay relevance but influence human behavior in the algorithm?
also, AlphaGo IS deepmind.

So start with the reward function they attempt optimize against for the 'vanilla' reinforcement learning. Now add a modifier to that equation to not just reward on performance but for exploring something 'new', then learn against that.
 
Basically it is pretty much the same algorithm, the biggest change is that they now have tremendous amount of permutations leading to a humongous search tree. The problem here was finding a way to do a tree search faster.

They did so by implementing a monte carlo search method which was guided by what they call a value network (which provides the estimate of the value of the current state of the game, i.e. probability of winning the game for a player given the current state) and a policy network (provides guidance regarding which action to choose, in the current state of the game), both of which use deep learning.

Why is AlphaGo more complex?

EDIT: In fact, I'm confused. They're both the same implementation of the same algorithm. How are they different? What makes alphago more complex?
 

nOoblet16

Member
Intrinsic motivation is all any AI has, though? Does this mean they're using designer cues (sparks, exploration) that have no gameplay relevance but influence human behavior in the algorithm?
also, AlphaGo IS deepmind.

If I understand your question right you are asking if it uses exploration ?
Exploration vs exploitation is a big thing in the field of deep reinforcement learning...most of the time it's just a random move that the neural network would play to test waters but it's interesting to see it even consider it in the first place.

I don't know anything about the game of Go, but when I attended the presentation by Marc Lanctot he mentioned that AlphaGO came up with a novel move that no one expected and it surprised even Lee Sedol.
 

nOoblet16

Member
Their Starcraft algorithm could be about Deep Reinforcement Learning in a multi agent predator-prey environment where the AI has to learn to compete or/and co-operate. Pretty much zero research exists in this area atm.

Why is AlphaGo more complex?

EDIT: In fact, I'm confused. They're both the same implementation of the same algorithm. How are they different? What makes alphago more complex?

Because of the huge tree, which is so big because of the sheer number of permutations available in the game of Go.
The algorithm that player Atari didn't have to do this.

Just think for a second, Deepmind made the Atari AI first and then spent the next 8-10 months to make AlphaGo...not the other way around, nor did they do it both simultaneously.
 

Timedog

good credit (by proxy)
I mean this Montezuma Revenge AI is nowhere near being on the same level as AlphaGo. Anybody with a cursory understanding of AI can (and probably has) written AIs that do this

I'm sure a group of some of the smartest people on earth are implementing the same basic-level stuff that you or I could do for their ground-breaking AI. Because that makes sense.
 

Dodecagon

works for a research lab making 6 figures
I love these threads because it's all so interesting and I understand NONE of it.

For those interested in learning more about machine learning and more topically deep learning, this reference is excellent :

http://www.deeplearningbook.org/

I'd say a basic undergraduate understanding of calculus, statistics, and linear algebra is enough to understand the material in its entirety due to the extensive introduction.
 
I'm sure a group of some of the smartest people on earth are implementing the same basic-level stuff that you or I could do for their ground-breaking AI. Because that makes sense.

v
Second of all, I wasn't talking about how hard it is to develop an AI for a non discrete search space, but rather the specific technique of incentivizing an AI to explore. It's trivial and it's used all the time in AI, pervasively even
^
 

Complete

Banned
Why would advanced AI prevent tournaments? People don't go there to play the single player mode.
Because sufficiently advanced AI will start getting rights of its own, and at a certain point there will be enough AI around that we can't just say "no, you're not allowed, this tournament is HUMANS ONLY". And then the AI will have well and truly won.

Granted, we're talking fairly far into the future - like at least two decades from now. But it'll happen. (I'm not sure we'll care at that point, however.)

To some extent, having a deep appreciation for the current state of the art in AI gives a greater appreciation to how far away AI is to reaching human intelligence. I've seen pop-SCI articles claim the opposite, but to some extent a lot of this material is old and enabled by the ridiculous compute capabilities offered by modern gpus. At the end of the day, a lot of what deep mind has accomplished can be seen as a way of brute forcing solutions ( and more specifically policies ) to optimize complex problem spaces via q-learning.
Well, that's the thing - the only thing stopping it from going much further is the lack of hardware with which to make more calculations.

The thing that's most interesting to me is the point of vertical growth wherein AI starts taking over materials research and dramatically increases advancements in computing hardware - assuming, of course, we don't hit some kind of hard physical barrier that makes that impossible. I suppose we won't know until we get there, however.
 
Oh wow.
It learned the speed running art of the death warp.
It was killing itself after getting the key to get to the locked door on the first screen faster...
 

nOoblet16

Member
Because sufficiently advanced AI will start getting rights of its own, and at a certain point there will be enough AI around that we can't just say "no, you're not allowed, this tournament is HUMANS ONLY". And then the AI will have well and truly won.

Granted, we're talking fairly far into the future - like at least two decades from now. But it'll happen. (I'm not sure we'll care at that point, however.)

You've been watching too much sci fi. :p
 
Status
Not open for further replies.
Top Bottom