Hi guys, interesting thread, thanks! This will be a long post, but hopefully helpful.
Some background
When we first implemented headset mode we wanted existing scenes to sort-of work straight away. ScreenTransform
(Z.screen
is an instance of that) is an orthographic projection (meaning the size of objects is the same regardless of distance). When switching to stereo, the way to maintain the concept of “Z doesn’t matter” is to essentially scale all of the z coordinates to 0 so everything relative to screen is rendered on one flat plane. Then so it works in stereo that plane needs to be placed some distance in front of the user (and scaled in x/y so it all fits in the eye viewports).
We considered allowing the user to set that effective distance but in the current implementation it is fixed. In practice I wouldn’t really recommend using Z.screen
for any headset experiences as there’s quite some magic involved in the above, and usual screen space properties don’t apply (eye viewports are not square, so e.g. objects positioned outside the -1 to 1 range in y may still be visible, or things in the corners may not be).
So that brings us to CameraTransform
(Z.camera
is an instance of that). In non-stereo mode the focal length of the camera for rendering content relativeTo Z.camera
is chosen based on the focal length (another way of describing field of view) of the device camera, so that AR content lines up correctly. In headset mode the meaning changes and the camera setup used for rendering is based on what is required for each eye viewport. That is based on the headset parameters (lens distortion details, spacing, distance between lenses and screen) rather than the camera ones.
[As an aside, the fact that the CameraTransform field of view is now based on the headset parameters rather than the camera means the “focalLength” parameter doesn’t have a simple equivalence in headset mode, which is why it currently doesn’t do anything useful. We do have some thoughts about how we could map it in a vaguely equivalent way but no concrete proposals yet].
Differences between mono and stereo
When rendering a view from effectively a single viewpoint, you can get away with all sorts of things. You can easily mix-and-match orthographic and perspective transforms, you can have different focal lengths in different perspective views, and don’t have to worry about “real depth”, as you can arrange things to render in the order you want using overlay
and hierarchy ordering for example. These are all perfectly useful and valid techniques when putting together a single-view full-screen experience, but in stereo experiences they can lead to problems.
Another interesting property of rendering a single perspective view is that the units are essentially arbitrary - if you double the scale of an object and also double its distance from the camera then it will look exactly the same. Our Target
coordinate system goes from -1 to 1 in y and it doesn’t usually matter the actual physical size of the target. You can design an experience on a postcard-size print and then view it poster-size and if you stand a bit further back it will look exactly the same.
Units do suddenly become non-arbitrary when rendering in headset mode. That’s because the camera is shifted left slightly from the origin to render the left eye, and right slightly to render the right one. The actual shift in “units” between the two is the distance between the headset lenses measured in metres. This effectively means that units for anything rendered in Z.camera
are now effectively “metres” as far as depth perception goes.
Conflicting depth cues
There’s a few different cues humans use to determine how far away objects are.
- The apparent size - if it’s a object that I know should be a certain size (eg a car) then I can tell how far away it is by how big it appears.
- Ordering constraints - if I see something solid that obscures part of the car, I assume it must be in front of it.
- Focus cues - if stuff is blurred it’s at a different depth to where I’m currently looking (NB: Usual VR displays can’t accurately recreate this. Magic Leap claim their “light field display” can do so, I’m looking forward to trying that).
- Stereo disparity - how the object appears in a different place in left and right views.
The “disparity” one is the one most people think of when thinking of stereo vision, but the others are all important too (stereo disparity is only really significant a few metres in front of the viewer).
Now imagine in ZapWorks Studio you position a car model at [0,0,-3] relative to Z.camera
. In headset mode that will appear 3m away in terms of stereo disparity. You’ll need to ensure the scale of the model is real-world too, to avoid a conflict between the apparent size cue and the disparity cue.
After that, let’s say you have a reticule positioned at [0,0,-5] relative to Z.camera
, and you make sure it’s drawn on top (placing it later in the hierarchy and setting it to overlay
layer mode). Now you’ve generated a conflict between disparity and ordering cues. The disparity cue tells us the reticule is behind the car, but the ordering cue tells us it must be in front. Generally the brain is pretty sure cars are solid and so it assumes the two reticule images are actually separate objects in front of the car. That’s why you perceive the reticule “spliting in two”.
The reticule problem
So clearly conflicting cues are a bad thing for the reticule. However even just ensuring it’s “in front of” the thing you are trying to get the user to look at is not enough. If the reticule is really close to the user and the thing they’re looking at is far away then it’s not possible to view both the reticule and the object of interest at the same time.
This is true in the real world too - make a reticule out of your thumb and forefinger, hold it 20 cm or so from your face and look through it at something a couple of metres away. Note you can only look at either your hand or the thing in the distance. With one eye closed it’s much less disturbing (hence how you can get away with all this stuff for full-screen single-viewpoint experiences) but as soon as you have two eyes open it’s basically impossible to attend to both things at the same time.
In headset experiences, all the content is always in focus regardless of distance, which makes this even more disturbing than in real life (if you focus on the thing in the distance through your hand then even though you see two images of the hand at least they’re a bit blurred so less disturbing).
So here’s the conclusion: for stereo experiences with a reticule, and where there is content at different distances from the camera, it’s going to be necessary to move the reticule as the user looks around.
Recommendations
So my first recommendation for headset experiences would be to only use perspective camera transforms (so CameraTransform
nodes). As you’ve discovered, focalLength should also always be -1 - so you could just use Z.camera
rather than your own CameraTransform
nodes, but if it helps scene structure or the headset / non-headset switch then feel free to structure your scene however you want.
Secondly remember units effectively become metres in headset mode. Ensure everything is scaled to reasonable real-world values, and that the scale / position of all your objects are consistent with each other. As discussed, that suddenly becomes way more important in a stereo experience than a full-screen one. When importing 3D models, you might want to disable the auto-scale option so they keep their native scales (if the models were created in some real-world units).
One way you can measure things is just to add a plane relative to Z.camera, set it to full3d
layer mode, and position it so it intersects the thing you want to measure. Then you can adjust the scale of the plane to measure it. Remember a plane goes from -1 to 1, so at scale [1,1,1] is actually 2 x 2 meters in headset mode. At scale [0.1, 0.1, 0.1] it would be 20cm x 20cm, etc.
How to adjust reticule position
The reticule is usually right in front of the user. As we need it to work in stereo, we avoid Z.screen
, but instead position it at [0, 0, -z] relative to Z.camera
.
The easiest way to adapt the z value as the user looks around is to use a Raycaster
. Raycasters point down their -Z axis already, so just a Raycaster relative to Z.camera
will do the trick. If you set colliderLimit
to 1 it will only return intersections for the nearest object to the raycaster (the objects need to have a tag that matches the colliderTag
set on the Raycaster). You can prefix the tag with global:
to use it between subsymbols - see the tag docs.
Then I’d use a raycaster.on("intersections", ...);
event to get the array of objects currently intersected by the raycaster in each frame. If the array is empty, just use a default value (1 or 2 metres is probably sensible but it will depend on your scene). If there is an intersection, you can get the distance from the camera to the object from the IntersectionEvent
reported.
As we’ve measured the distance to the intersection from the camera, the maths is pretty simple if we want to keep the reticule the same size on screen - position it at [0, 0, -1 * distance]
and set the scale to [scale * distance, scale * distance, scale * distance]
where scale is just a factor that will control it’s effective size (essentially how big you want it to appear from 1m away if you want to think of it like that).
You might also want to adjust the distance a bit - eg have the reticule slightly in front of the reported distance (either a fixed amount of a few cm, or some factor of the distance - feel free to play around to see what works well for your scene).
Example project
I’ve attached a quick sample zpp showing that technique. There’s a couple of planes in an attitudeOrient group at different distances, and a reticule that will automatically move between them. When not over either plane it will be placed at 2m (the camera image is rendered at z=-5 in Headset mode at the moment if there are no tracking images in the scene).
Note I’ve also used a larger plane for the raycaster intersection region for the frontmost object so the reticule is moved forward before any part of it overlaps the front image.
HeadsetReticuleDemoFixed.zpp (38.2 KB)
Hope it helps, and good luck with your projects!