Allow transparent areas of trained image to be ignored when tracking

howiemnet · November 26, 2021, 5:21pm

If we could use the transparent areas of a PNG as a way to tell Zap to ignore those regions/pixels, we could do some really funky stuff.

It’d break us free of rectangular targets. (Yeah, you can kinda do round things like badges/buttons at the moment, but some fabrics behind the badge trip up the tracker because you can’t get it to ignore the corners)

For example: we could use our logo as an image target - set against a transparent background - and then anywhere in the world the logo appears, whatever the background, we could have it animate and come to life. A shop could, for example, have a QR code on the door as a trigger, then the big logo over the shopfront could animate, etc etc.

No idea what kinda clever vison-hashing-pattern-recognition voodoo goes on behind the scenes to make the tracking work, but removing the “has to be rectangular” limitation would open all sorts of interesting doors if it’s possible.

For our live shows, it’d be great to be able to say “if you scan this QR code, all our logos round the venue will come to life”

Anyhoo, just a random Friday wish

simon · November 28, 2021, 11:07am

Hi @howiemnet,

We do have this one implemented on a branch at the moment, so it’s something we’ll likely roll out soon (either with or shortly after the update to support cylindrical and other curved targets).

For a brief description of the “voodoo” - image tracking relies on finding multiple distinct “features” in the camera view - each feature has a footprint of 15x15 pixels in our current implementation, and so valid features must be at least half that distance away from any border (and it’s the distance in our relatively low-res camera feed that matters here, so with targets that are also small in the camera view this can be significant).

What that means is simply training with transparency won’t magically make really complex, thin structures able to track well on arbitrary backgrounds, and won’t help with stuff like monochrome patterns on transparent backgrounds (things like snowflake decorations - they’d just have no features at all). Where it does have the potential to help is for cases like circular targets, where it should bring a bit of an improvement vs using a square internal crop, or using a solid-colour background over the full target.

stevesanerd · November 28, 2021, 6:33pm

I can’t wait for this!!!
I could use it now for a project working on!!
How soon @simon and will it be WebAR ready as well?

Steve

howiemnet · November 29, 2021, 11:21am

Thanks, @simon, that’s really useful to know.

So there are two different tracking paradigms in use in Zap: the Zapcode tracking is more akin to the approach used by QR codes—shape/timings-based—whereas the image tracking is not interested in shapes/geometry, but uses a similar approach to traditional movie-making camera tracking, trying to recognise enough known markers/feature-blocks to be able to solve for the relative camera position.

That explains very neatly why we can’t currently track, say, a single-flat-colour logotype (a pure shape) against an arbitrary background

There are a few inconsistencies in the docs regarding ideal tracking images. Could you throw any light on these Qs?

Ideal resolution of the training image:

If you were exporting a target image from Photoshop / Illustrator etc—you can pick an arbitrary size, of course—what resolution would you specify? 500px max axis, or larger?

The docs are a little inconsistent in places. On the Studio docs we’re told to stick to standard sizes (examples given: 1920x1080, 1024x1024 - a mix of video and GPU standards) and keep between 1:4 and 4:1 aspect ratios; we’re told to avoid non-standard (which standard tho?) sizes, and/or odd-numbered pixel counts on either axis. On the CLI docs we’re told training works best on images between 200 and 500px on both axes, and that larger images will be resized to 500px max.

Different sized versions of the same training image produce different sized .zpt files - I’m guessing that’s partly down to the preview bitmap being stored in the file too: but given we don’t usually use the preview bitmap within the published experience:

Does a larger tracking image increase the final download size of the project?
If so, how about monochrome vs colour—can we reduce the zpt footprint by desaturating the target image first?

Contrast / levels:
I’m guessing that most of us use print-ready digital artwork as training images, rather than taking a photo of a printed version to use—but a photo of a print would arguably be closer to what the camera would see. With that in mind:

Is there likely to be any benefit in reducing contrast / brightness of target images before training, to better match what the camera feed would see? Conversely, given you must already compensate for this, would it help if we did the opposite?

Upright vs flat training:
The training process asks us to choose between upright vs flat target generation. The only difference I can see in how the target may solve is that an upright target (poster on a wall, say), is likely to have less camera roll (Y+ axis—relative to the image, not the real world—is likely to be closer to “up”). A target on a desk will have slightly more freedom in all the axes.

Is this the reason for the upright/flat question? Is it less about the training algorithm, but more about optimising/hinting the tracking algorithm? (If so, it could be good to move that question/option out of the training process and making it a checkbox within Studio - our target/AR experience, for example, will be used as a poster and as a post-card: I can tell the experience which one to expect through a queryString parm if it optimises the tracking)

Debugging and diagnostics:

Is there any way we can see what features have been identified across our image? Well, see where in our image features have been chosen? Some cheeky little CLI app perhaps?

That’d be incredibly useful to see. It’d mean we could see, for example, if all the features identified ended up clustered in one area, or weighed too centrally to the image. Ideally we want the corners/edged to be well represented for a nice stable solve, with a reasonable spread across the rest of the image to help when part of the target’s cut off in the camera view.

It’d also mean we could experiment with biased training, too; if our art has to have lots of detail in the middle and less at the edges (blame the client), we could potentially knock back the HF detail in the centre, forcing features round the edges to be chosen - but just for the training.

Pure curiosity:

How many features do you try to identify for initial acquisition?
Do you weight them? Eg corners first, then mid-edges, then more central? If you’re confident you’ve identified the corners of the image, do you stop looking for interior features?
Is there any super-secret CLI-flag hackery possible for reducing/increasing the feature count for tuning toward less powerful / more powerful devices? (with the obvious caveat that we take responsibility for the speed/stability/partial-target tradeoffs involved each way?)

That last one’s an interesting point: I’m fighting for CPU cycles on older devices (too many particle systems, d’oh) so if I could identify that it’s a slower device, It’d be great to be able able to notify the tracking engine that it should only look for the corner features, giving up a bit of stability if the user pushes in too close, in exchange for better graphics performance. That could make for a better experience overall. Could be moot if you’re already doing this; if your algorithm is already adaptive enough.

And finally:

Given that you already have a shape/timing-based recognition algorithm working (for Zap codes), it could be really interesting to see if you can allow for arbitrary shape (1-bit image) recognition, then solve for camera position from it. Then we could do things like logo-recognition. It’d take a significant R&D effort, though. Mebbe add it to the wish-list?

Well that got a bit long-winded. But honestly, the more information you can provide (without giving away the farm … gotta protect yer IP I guess) the more efficient we can be in creating target images and optimising the whole thing. Plus it’s fun experimenting and hacking to see how fast we can make things.

And I’m constantly impressed at how well Zap maintains the tracking, keeping the solve accurate even if I push the camera right into the centre of our postcard. It’s very clever

Cheers ; )