Thanks, @simon, that’s really useful to know.
So there are two different tracking paradigms in use in Zap: the Zapcode tracking is more akin to the approach used by QR codes—shape/timings-based—whereas the image tracking is not interested in shapes/geometry, but uses a similar approach to traditional movie-making camera tracking, trying to recognise enough known markers/feature-blocks to be able to solve for the relative camera position.
That explains very neatly why we can’t currently track, say, a single-flat-colour logotype (a pure shape) against an arbitrary background
There are a few inconsistencies in the docs regarding ideal tracking images. Could you throw any light on these Qs?
Ideal resolution of the training image:
- If you were exporting a target image from Photoshop / Illustrator etc—you can pick an arbitrary size, of course—what resolution would you specify? 500px max axis, or larger?
The docs are a little inconsistent in places. On the Studio docs we’re told to stick to standard sizes (examples given: 1920x1080, 1024x1024 - a mix of video and GPU standards) and keep between 1:4 and 4:1 aspect ratios; we’re told to avoid non-standard (which standard tho?) sizes, and/or odd-numbered pixel counts on either axis. On the CLI docs we’re told training works best on images between 200 and 500px on both axes, and that larger images will be resized to 500px max.
Different sized versions of the same training image produce different sized .zpt files - I’m guessing that’s partly down to the preview bitmap being stored in the file too: but given we don’t usually use the preview bitmap within the published experience:
- Does a larger tracking image increase the final download size of the project?
- If so, how about monochrome vs colour—can we reduce the zpt footprint by desaturating the target image first?
Contrast / levels:
I’m guessing that most of us use print-ready digital artwork as training images, rather than taking a photo of a printed version to use—but a photo of a print would arguably be closer to what the camera would see. With that in mind:
- Is there likely to be any benefit in reducing contrast / brightness of target images before training, to better match what the camera feed would see? Conversely, given you must already compensate for this, would it help if we did the opposite?
Upright vs flat training:
The training process asks us to choose between upright vs flat target generation. The only difference I can see in how the target may solve is that an upright target (poster on a wall, say), is likely to have less camera roll (Y+ axis—relative to the image, not the real world—is likely to be closer to “up”). A target on a desk will have slightly more freedom in all the axes.
- Is this the reason for the upright/flat question? Is it less about the training algorithm, but more about optimising/hinting the tracking algorithm? (If so, it could be good to move that question/option out of the training process and making it a checkbox within Studio - our target/AR experience, for example, will be used as a poster and as a post-card: I can tell the experience which one to expect through a queryString parm if it optimises the tracking)
Debugging and diagnostics:
- Is there any way we can see what features have been identified across our image? Well, see where in our image features have been chosen? Some cheeky little CLI app perhaps?
That’d be incredibly useful to see. It’d mean we could see, for example, if all the features identified ended up clustered in one area, or weighed too centrally to the image. Ideally we want the corners/edged to be well represented for a nice stable solve, with a reasonable spread across the rest of the image to help when part of the target’s cut off in the camera view.
It’d also mean we could experiment with biased training, too; if our art has to have lots of detail in the middle and less at the edges (blame the client), we could potentially knock back the HF detail in the centre, forcing features round the edges to be chosen - but just for the training.
- How many features do you try to identify for initial acquisition?
- Do you weight them? Eg corners first, then mid-edges, then more central? If you’re confident you’ve identified the corners of the image, do you stop looking for interior features?
- Is there any super-secret CLI-flag hackery possible for reducing/increasing the feature count for tuning toward less powerful / more powerful devices? (with the obvious caveat that we take responsibility for the speed/stability/partial-target tradeoffs involved each way?)
That last one’s an interesting point: I’m fighting for CPU cycles on older devices (too many particle systems, d’oh) so if I could identify that it’s a slower device, It’d be great to be able able to notify the tracking engine that it should only look for the corner features, giving up a bit of stability if the user pushes in too close, in exchange for better graphics performance. That could make for a better experience overall. Could be moot if you’re already doing this; if your algorithm is already adaptive enough.
- Given that you already have a shape/timing-based recognition algorithm working (for Zap codes), it could be really interesting to see if you can allow for arbitrary shape (1-bit image) recognition, then solve for camera position from it. Then we could do things like logo-recognition. It’d take a significant R&D effort, though. Mebbe add it to the wish-list?
Well that got a bit long-winded. But honestly, the more information you can provide (without giving away the farm … gotta protect yer IP I guess) the more efficient we can be in creating target images and optimising the whole thing. Plus it’s fun experimenting and hacking to see how fast we can make things.
And I’m constantly impressed at how well Zap maintains the tracking, keeping the solve accurate even if I push the camera right into the centre of our postcard. It’s very clever
Cheers ; )