• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

HoloLens Minecraft Demo

Mihos

Gold Member
I like the AR idea for a lot of productivity type things... but I really can't think of a single game I would want it for outside of just a private screen. I guess some people could be into the table top game stuff, but I don't even like those in real life. Just give me better pass-through cameras on my VR headset and I am good.
 

Archaix

Drunky McMurder
It's obvious they are liars, paid off, or just ignorant of all that we obviously know is the truth.


It's mostly because the technology is cool and something like they haven't seen. It's also partly because they assumed the things that Microsoft lies about in their stage demos will actually be fixed in the final product, which they were later told was not a correct assumption.
 

watership

Member
It's mostly because the technology is cool and something like they haven't seen. It's also partly because they assumed the things that Microsoft lies about in their stage demos will actually be fixed in the final product, which they were later told was not a correct assumption.

So the FOV is impossible to fix? That seems to be the only limitation I see. Outside of that, the tech works as shown. That's not faked, as much as people want it to be. I honestly don't know if FOV can be fixed though, because looking straight ahead is easy as images become a flat plane, but as the human eye sees the peripheral, the image warps. The human brain adjusts, but how do you 'display' that? It's going to be a huge challenge.
 

Archaix

Drunky McMurder
So the FOV is impossible to fix? That seems to be the only limitation I see. Outside of that, the tech works as shown. That's not faked, as much as people want it to be. I honestly don't know if FOV can be fixed though, because looking straight ahead is easy as images become a flat plane, but as the human eye sees the peripheral, the image warps. The human brain adjusts, but how do you 'display' that? It's going to be a huge challenge.


It's not impossible to fix, but Phil Spencer said in an interview later in the day that it won't be changed significantly for the retail release.

I don't think Hololens is bad, I think the fact that they are being deceptive with their stage demos is bad. It's not only a Microsoft problem, it's the same reason I didn't actually bother paying attention to any of the games that Ubisoft showed during their conference. But it's annoying when people refuse to accept that what they were shown is not the actual experience, and more annoying when Microsoft goes out of their way to make it seem like it is the exact same.
 

Tfault

Member
I know this is a Halo 5 application (well worth a read for Halo 5 fans), but it does mention the FOV problem and it's impact.

At 6 feet 3 inches, I'm tall, but I'd still have a hard time measuring up to Spartan 117's 7-plus-foot frame. Even though I stand in his shadow, though, once the hologram switched to a 3D overview of the map where I'd play Halo 5's new multiplayer mode, Warzone, HoloLens' magic shattered a bit.

Between the wall behind me and the planning table there was about six feet; the recommended standing distance was somewhere in the middle. For me to see the entire map, with its central tower jutting into the air, I had to have my back against the wall. From there the view was fine and getting mission objectives and mission-critical locations pointed out to me was really slick. But if I leaned too close to see a highlighted spot on the map, I couldn't see everything at once given HoloLens' relatively narrow field of view.

I got Holo-briefed on 'Halo 5'
 

Rembrandt

Banned
It's not impossible to fix, but Phil Spencer said in an interview later in the day that it won't be changed significantly for the retail release.

I don't think Hololens is bad, I think the fact that they are being deceptive with their stage demos is bad. It's not only a Microsoft problem, it's the same reason I didn't actually bother paying attention to any of the games that Ubisoft showed during their conference. But it's annoying when people refuse to accept that what they were shown is not the actual experience, and more annoying when Microsoft goes out of their way to make it seem like it is the exact same.

Who's doing that last part? What's annoying are the numerous people outright stating that its all smoke and mirrors.
 
The moving head is already tracked, since it's the basis of what makes AR possible. And the dynamic depth of the scene is a raw data that is constantly measured and doesn't need additional tracking or analysis. It's as I described it : the AR part already computed the depth of the virtual scene relative to the head. Compare its depth buffer to the raw output of the depth sensors, and you have all the pixels where you should not overlay, all in a single operation.
If it's working as you describe, then why isn't it working as it should? If they really are mapping everything in real time, then it seems that your hand should just automatically occlude the virtual object — along with producing the offset artifacts you mentioned earlier — but that isn't happening. So it seems that whatever is actually happening, it's not what you believe it to be.


I will say that I'm eating crow after saying Minecraft just wouldn't work with Hololens. I did not think they'd be capable of the scrolling action they showed, though I'm still sceptical of how useable it'd all be in actual practice.
As far as we know, they still aren't. The stage demo was choreographed, with the game responding to scrolling commands the guy hadn't even issued yet. The demo that GiantBomb saw was completely hands-off, with no gesture controls, or interactivity of any kind. Link

Either way, it's yet more proof that you should not judge the potential of something based solely on your own limited imagination!
Nah, it sounds like you're* still better off trusting your instincts.

*Advice may not apply to all gaffers.


True, they said it wouldn't be hugely different, but Jeff was still entirely positive about it.
Meanwhile, Brad was shocked at how limited it was.

Numerous previews have mentioned it so I doubt people are going to buy it and be taken aback by it not being shown as intended.
Most people won't be reading these hands-on impressions. They'll just be watching the BS stage demos.

I didn't notice him mention the lack of occlusion in the interview.
See above. The demo was non-interactive, so there was no opportunity for occlusion. He probably had his hands in his lap the entire time.

that just seems overly negative since it's extremely amazing tech that works completely as intended but just as one drawback.
I'm not convinced they've actually solved the mapping of the environment either. Actually, it's sounding more and more like they haven't.


So reviewing the previous HoloLens demos, I do see evidence of scene object occlusion. ie. an awareness of the stationary, basic objects in the room (cubes, tables etc.). Now whether this is "learned" when the user first walks into the room and emits the "pulse" or whether it was pre-programmed and loaded as a bit of trickery for the live demo, it's hard to say.

But I have still not seen any evidence of occlusion for moving objects (hands, heads or other body parts) or non-basic stationary shapes (lamps, plants, etc.). And only see evidence of issues from the earliest demos.
Exactly. It's sounding more and more like the room itself has been pre-mapped. The on-board Kinect is likely just looking for known landmarks that it can use for snapping virtual objects and for positional tracking. Another possibility is that the demo rig actually uses Oculus-style tracking, with IR LEDs, and then they just use the pre-generated map of the room to decide where to render things and when to occlude.

Regardless, the lack of occlusion for dynamic, physical objects would seem to indicate this isn't entirely real-time as we're being led to believe.


Was this impression posted? Gizmodo -I Played Minecraft With Microsoft's HoloLens, And It Was Pretty Awesome

He also of course mentions earlier in the article the limited FoV, it's a little front-heavy, and you see some "weird ghostly rainbowing effects in the corners of your vision". Doesn't mention anything about occlusion.
He was using a controller for that demo, so no real opportunity for occlusion there.
 

Alx

Member
If it's working as you describe, then why isn't it working as it should? If they really are mapping everything in real time, then it seems that your hand should just automatically occlude the virtual object — along with producing the offset artifacts you mentioned earlier — but that isn't happening. So it seems that whatever is actually happening, it's not what you believe it to be.

I'm not saying it was running it on the demo we saw, I'm saying handling such occlusions at least at a basic level isn't hard at all, since anybody who has ever worked with a depth camera could have thought of the solution I suggested. I can't say for MS why they preferred to have no occlusion handling at all, maybe they preferred polishing the rendering instead of showing a demo with front objects having fuzzy outlines...
Anyway I'm not the least worried about that issue. We even know the hands are in range of the depth cameras since the standard UI is based on "tapping" on tiles with your fingers.
I think for technical limitations, things like battery life, handling of fast motions, IPD confdiguration and multi-user calibration is much more troublesome than occlusion handling.

Anyway, as an additional comment to everything that was shown, I think it's important to remember that MS isn't only showcasing Hololens here, but its whole AR framework. The fact that the headset wearer and the moving camera are seeing the same elements with good synchronisation is an important achievement too. If we remember the reveal of last January, their concept illustration showed the same AR scene seen from a Hololens, an Xbox and a tablet. Basically it's an AR solution for anything with a depth sensor and a display (assuming the tablet was meant to have one). I wouldn't be surprised if we saw many different devices supporting similar demos, Hololens being of course the most complex.
 
I'm not saying it was running it on the demo we saw, I'm saying handling such occlusions at least at a basic level isn't hard at all, since anybody who has ever worked with a depth camera could have thought of the solution I suggested. I can't say for MS why they preferred to have no occlusion handling at all, maybe they preferred polishing the rendering instead of showing a demo with front objects having fuzzy outlines...
That was sorta my point though. It seems like if they really are mapping in real time, the occlusion of virtual objects by dynamic objects should be automatic and mostly unavoidable, actually. The fact that only scene objects cause occlusion seems to indicate the environment has been pre-mapped.

Anyway I'm not the least worried about that issue. We even know the hands are in range of the depth cameras since the standard UI is based on "tapping" on tiles with your fingers.
I think for technical limitations, things like battery life, handling of fast motions, IPD confdiguration and multi-user calibration is much more troublesome than occlusion handling.
Actually, mapping the environment always struck me as the most difficult problem to solve, and I'm not seeing anything to indicate they've actually done so.
 

Alx

Member
I don't know why you're talking about environment mapping, all you need to handle occlusion is a depth map, and the embedded sensors are measuring it at least at 30 frames per second. It is raw data, there is no geometry to process.
 
Sorry, I'm talking about the map of the environment that's being crated with the depth data. Yes, the Kinect just feeds a bunch of raw depth data. Then, the system needs to process that data to determine what it's actually looking at. Once they've identified the various objects, they then need to combine what's seen from that angle with all of the data they've captured from other angles to build up a full 3D map of the environment. A single scan doesn't really provide a great deal of useful information. You need multiple scans from multiple angles to really determine what's present in the environment, and where you are in relation to that stuff. Data from a depth camera is more useful that a standard video feed, but this is still machine vision, at its core. It's hard enough to teach a computer to recognize known objects, but making sense of truly arbitrary input is far harder still. bam.gif

Now, if they were doing all of that stuff "live," then you're right, and occlusion from your hand would be automatic, and largely unavoidable. However, the fact that your hand doesn't cause occlusion would seem to indicate that they are in fact not mapping the environment in real time, and the furniture and stuff has all been pre-mapped. The Gizmodo guy said he went "off script" and put the game map on the floor, but if they'd pre-mapped the room, the floor would be a part of that, and he wasn't really off script at all. If things were truly dynamic and real time, he should be able to place an object in the palm of his hand or on a sheet of paper, for example. AFAIK, they haven't demonstrated anything like that.
 
Fair enough. The clip in the OP was cut off a bit, so I didn't see how it was introduced. Did they say something like, "We'd like to show you a little play that gives you an idea of what we might be able to achieve in the future," or did they say something like, "And we're going to demonstrate it for you right now"? I didn't see any kind of "Product Vision" disclaimer in the clip.
Just saw the whole clip. The chick introduces it by saying, "We're always trying to harness new technology and create new ways of playing the game. With that in mind, please let me welcome Saxs Persson, from Microsoft Studios, and we're excited to be able to show a new version of Minecraft built specifically for Microsoft Hololens." So, no real disclaimer there.

Then the dude takes the stage and says, "Thank you, Lydia. To show our demo today, we're using a special camera. This display technique is like putting a Hololens right on the camera itself, allowing the entire audience to see the hologram. Now, this is a live demo, with real working code. Let's show what it can do." So, not only did he misrepresent what was actually happening, he actually went out of his way to lie about it, claiming it was done live with real code, when it was clearly choreographed, just like the Kinect Star Wars stuff.
 
Just saw the whole clip. The chick introduces it by saying, "We're always trying to harness new technology and create new ways of playing the game. With that in mind, please let me welcome Saxs Persson, from Microsoft Studios, and we're excited to be able to show a new version of Minecraft built specifically for Microsoft Hololens." So, no real disclaimer there.

Then the dude takes the stage and says, "Thank you, Lydia. To show our demo today, we're using a special camera. This display technique is like putting a Hololens right on the camera itself, allowing the entire audience to see the hologram. Now, this is a live demo, with real working code. Let's show what it can do." So, not only did he misrepresent what was actually happening, he actually went out of his way to lie about it, claiming it was done live with real code, when it was clearly choreographed, just like the Kinect Star Wars stuff.

Those monsters.

I know you're trying to accomplish something here, but I'm not sure what it is.

Are you trying to force us to feel less excited about new tech?
 
Those monsters.

I know you're trying to accomplish something here, but I'm not sure what it is.

Are you trying to force us to feel less excited about new tech?
I'm trying to help people to be appropriately excited. I think variable swords are pretty exciting tech, but I don't harbor any delusions about them being available in the near future. Microsoft went out of their way to blur the lines between reality and fantasy — again — and so now I'm trying to clean up the mess they made in the process. There are a lot of people who think they stuff they showed is real, but it's anything but.

More to the point, I was responding directly to you, who said you didn't really think they lied about it, but he straight up did. Dude said, "This is totally not fake," and it totally was. If you want to live in Fantasyland, that's your decision, but I try to assume the best of people. Therefore, I assumed you'd prefer to be grounded in reality. If I'm mistaken, I apologize for having offended you.
 

Alx

Member
Sorry, I'm talking about the map of the environment that's being crated with the depth data. Yes, the Kinect just feeds a bunch of raw depth data. Then, the system needs to process that data to determine what it's actually looking at. Once they've identified the various objects, they then need to combine what's seen from that angle with all of the data they've captured from other angles to build up a full 3D map of the environment. A single scan doesn't really provide a great deal of useful information. You need multiple scans from multiple angles to really determine what's present in the environment, and where you are in relation to that stuff. Data from a depth camera is more useful that a standard video feed, but this is still machine vision, at its core. It's hard enough to teach a computer to recognize known objects, but making sense of truly arbitrary input is far harder still. bam.gif

I don't think I've made myself clear enough. Yes for the whole AR rendering you need a 3D model of the environment and that may not be updated in real time. But once you've decided the geometry of what you want to display, handling occlusions doesn't need the model of the room any more, or even any understanding of what's there. All it needs is a depth image of what's in front of the headset, and compare those depths to those of the virtual objects. And that depth image is the raw output of the sensors.
If I raise my hands in front of my eyes/the headset, the "kinects" will see pixels at a distance of something like 30 cm, and others further away. The system also knows it wants do display a Minecraft cube 1m away from me. All it has to do is disable the rendered pixels on all those pixels measured at 30 cm since it knows "there's something closer than the cube there" and you'll see your hand instead of the render. It doesn't jeed to know it's a hand and it can be anything actually, at any distance, it's just comparing two pixel depths.
Like I said if you do it without any geometry correction you will see artifacts at the edges of your hand, like when you superimpose raw RGB and depth images from a kinect1. But it's already a good approximation, and MS has already developed the geometry correction thing for kinect anyway. I guess the fact that there are multiple kinects to merge adds another task of "stitching" the depth images together, but that's probably something they're already doing for gesture tracking anyway.

Then the dude takes the stage and says, "Thank you, Lydia. To show our demo today, we're using a special camera. This display technique is like putting a Hololens right on the camera itself, allowing the entire audience to see the hologram. Now, this is a live demo, with real working code. Let's show what it can do." So, not only did he misrepresent what was actually happening, he actually went out of his way to lie about it, claiming it was done live with real code, when it was clearly choreographed, just like the Kinect Star Wars stuff.

It's not like they were running a video and pretending it's what the camera was seeing. You can't fake the AR part, with the camera guy moving around and the presenter himself appearing in the picture. So all that part was indeed a live demo with real running code. Of course the demo was rehearsed, and probably some of the things were scripted, like the voice and hand controls. It's not like there was a doubt gesture and voice control were possible.
 
I don't think I've made myself clear enough. Yes for the whole AR rendering you need a 3D model of the environment and that may not be updated in real time. But once you've decided the geometry of what you want to display, handling occlusions doesn't need the model of the room any more, or even any understanding of what's there. All it needs is a depth image of what's in front of the headset, and compare those depths to those of the virtual objects. And that depth image is the raw output of the sensors.
If I raise my hands in front of my eyes/the headset, the "kinects" will see pixels at a distance of something like 30 cm, and others further away. The system also knows it wants do display a Minecraft cube 1m away from me. All it has to do is disable the rendered pixels on all those pixels measured at 30 cm since it knows "there's something closer than the cube there" and you'll see your hand instead of the render. It doesn't jeed to know it's a hand and it can be anything actually, at any distance, it's just comparing two pixel depths.
Sure, it doesn't need to know it's a hand — although, knowing it's a hand is in the list of claimed functionality — it just needs to know that "something" is in the way. As you say, this should be entirely automatic, yet the system seems to be blissfully unaware of the fact that you've put your hand between your eyes and the virtual object. Now, if you get some furniture in the way — like, if you lay on the floor and look up at the underside of the coffee table — then occlusion does occur, so it's not like they're ignoring occlusion completely. As you said, it works automatically, as long as we're talking about an object MS had a chance to map ahead of time. See what I mean?

That would seem to indicate the Kinect isn't really actively scanning the room at all during these demos. Rather, it seems like they're tracking the position of the headset via unknown means, and the map of the room is baked in rather than being produced on-the-fly by the system. Only being able to deal with pre-mapped objects is a pretty major hole in the overall functionality of the device. That's why I keep asking if any of the demo subjects tried rearranging the furniture mid-demo; I'm trying to determine the actual capabilities of the system as demonstrated.

Like I said if you do it without any geometry correction you will see artifacts at the edges of your hand, like when you superimpose raw RGB and depth images from a kinect1. But it's already a good approximation, and MS has already developed the geometry correction thing for kinect anyway. I guess the fact that there are multiple kinects to merge adds another task of "stitching" the depth images together, but that's probably something they're already doing for gesture tracking anyway.
Do you happen to have a source for that, and/or know anything about how they're configured? The only source I've seen only mentioned a single Kinect. When you first brought up the offset issue, it occurred to me that you could solve it by having one Kinect above your eyes and another below your eyes, but I don't really know how they've set it up, or if in fact there's a second Kinect at all.

It's not like they were running a video and pretending it's what the camera was seeing. You can't fake the AR part, with the camera guy moving around and the presenter himself appearing in the picture. So all that part was indeed a live demo with real running code. Of course the demo was rehearsed, and probably some of the things were scripted, like the voice and hand controls. It's not like there was a doubt gesture and voice control were possible.
Yet, the gesture controls clearly were faked. So what gives? Plus, the real point was, the guy went out of his way to say the demo was totally real, and your comeback is, "Well, part of it was real, at least." I'm not saying none of this has any basis in reality. I'm saying it doesn't work like they claim.
 

Alx

Member
The dual kinect can be seen in the teardown animation they showed at build. Anyway they say the depth sensing has a fov of 120x120, and it's continuously active since it's used for gesture tracking, and certainly for environment perception (it's one thing to jave created a 3D map of the environment, but to locate yourself in that map you still need instant depth information).
The use of that map may allow handling of occlusions by static objects indeed, but that's not what we're discussing, it's all about filtering dynamic objects, and that you can do without the map. It may not be active in the current demos, but there's absolutely no reason it can't be done, it's something that anybody can do with a kinect and litteraly three lines of code already (as a matter of fact it's similar to one of the earliest tests I did with the hacked SDK for kinect1)


Yet, the gesture controls clearly were faked. So what gives? Plus, the real point was, the guy went out of his way to say the demo was totally real, and your comeback is, "Well, part of it was real, at least." I'm not saying none of this has any basis in reality. I'm saying it doesn't work like they claim.

They wanted to do an AR live demo and they did an AR live demo. Press people got to try it later and said it did work like they claim. It's good enough for me.
 

JaggedSac

Member
Sorry, I'm talking about the map of the environment that's being crated with the depth data. Yes, the Kinect just feeds a bunch of raw depth data. Then, the system needs to process that data to determine what it's actually looking at. Once they've identified the various objects, they then need to combine what's seen from that angle with all of the data they've captured from other angles to build up a full 3D map of the environment. A single scan doesn't really provide a great deal of useful information. You need multiple scans from multiple angles to really determine what's present in the environment, and where you are in relation to that stuff. Data from a depth camera is more useful that a standard video feed, but this is still machine vision, at its core. It's hard enough to teach a computer to recognize known objects, but making sense of truly arbitrary input is far harder still. bam.gif

Now, if they were doing all of that stuff "live," then you're right, and occlusion from your hand would be automatic, and largely unavoidable. However, the fact that your hand doesn't cause occlusion would seem to indicate that they are in fact not mapping the environment in real time, and the furniture and stuff has all been pre-mapped. The Gizmodo guy said he went "off script" and put the game map on the floor, but if they'd pre-mapped the room, the floor would be a part of that, and he wasn't really off script at all. If things were truly dynamic and real time, he should be able to place an object in the palm of his hand or on a sheet of paper, for example. AFAIK, they haven't demonstrated anything like that.

There are already libraries available for Kinect to produce a model of the environment as the camera moves around. That problem is solved already.
 
I'm trying to help people to be appropriately excited. I think variable swords are pretty exciting tech, but I don't harbor any delusions about them being available in the near future. Microsoft went out of their way to blur the lines between reality and fantasy — again — and so now I'm trying to clean up the mess they made in the process. There are a lot of people who think they stuff they showed is real, but it's anything but.

More to the point, I was responding directly to you, who said you didn't really think they lied about it, but he straight up did. Dude said, "This is totally not fake," and it totally was. If you want to live in Fantasyland, that's your decision, but I try to assume the best of people. Therefore, I assumed you'd prefer to be grounded in reality. If I'm mistaken, I apologize for having offended you.

You didn't offend me at all. I just think you're trying way too hard to rain on the parade.
 

Grinchy

Banned

I'm less bothered by the low FOV than I am by the false presentation of what the technology is. It's one of those things I'll have to try out for myself. It's just so annoying that they show it on a stage in a way that isn't possible for the tech to achieve to get people believing it's better than it really is. It's what they do with everything new they show off and it's not cool.
 

Soi-Fong

Member
Here's the direct quote about the FOV. Seriously, all MS is doing at this point is basically lying to the consumer. The way they presented it on stage was as if you had this wide FOV, when in reality, it's small.

Question:


"is the plan to have full peripheral on that?"

Answer:


"Certainly the hardware isn't final...but I think, you're never going to get to full peripheral, but certainly the hardware we have now the field of view isn't exactly final but I wouldn't say it's going to be hugely noticeably different either."
This is stuff they should be upfront about. With a limited FOV, this tech is immediately limited and less immersive than what they were making it out to be.

It's Kinect and Milo all over again. This is no different than lying to the consumer. Seriously, MS shouldn't get away with this stuff.
 
Jeff Gerstmann's impressions of HoloLens has me pretty excited, I mean, when was the last time he was excited about anything? Trackmania?
 

th4tguy

Member
I would enjoy the use of ar as a expanded hud. Imagine sitting on a couch and playing halo. All of the hud that would be in the chiefs helmet is now shown through the glasses. Glance at the coffee table in front of you for a large map display/ AI (cortana?) figure discussing your current objectives when prompted.

Imagine what Kojima could have done with AR and Silent Hills.
 

AndyD

aka andydumi
I think you guys are focusing on the wrong sentence in that statement. When they said "This display technique is like putting a Hololens right on the camera itself, allowing the entire audience to see the hologram" and the camera view did not show the actual FOV of what the guy was seeing, that's the most misleading part to me, not the pre-scripted part. We all know they rehearse and script things.

I would enjoy the use of ar as a expanded hud. Imagine sitting on a couch and playing halo. All of the hud that would be in the chiefs helmet is now shown through the glasses. Glance at the coffee table in front of you for a large map display/ AI (cortana?) figure discussing your current objectives when prompted.

Imagine what Kojima could have done with AR and Silent Hills.
I fully agree. But that requires it to be in edges and peripheral vision not the center. Which is exactly where it is not. I think we may get your described situation in the fully VR method shown where it streams the flat game in the center, and adds stuff to the edges.
 
Top Bottom