Hand tracking with Zappar!

Just experimenting with Zappar Three.js and hand tracking ml models, what do you guys think?:raised_hand_with_fingers_splayed:

7 Likes

NICE!!!
Now just need it in unity :slight_smile:

Steve

1 Like

Nice work @mihir!

Is this the hand tracking model from mediapipe, or something else?

Just wondering what benefits you’re getting with using the Zappar Universal AR SDK in this example - it doesn’t seem to be making use of any of our in-built tracking types? Our SDK also doesn’t really expose an API to allow plumbing through user-provided computer vision code with the rest of our computer vision pipeline to ensure everything stays in sync. That is something we may expose in future if there’s sufficient demand, so I’d be interested to hear more about your use case.

Hand tracking in general is something we’re looking at, initially targeting use in ZapBox although we’re also keen to see what we can do on the web for mobile devices, and whether we can achieve smoother frame-rates on par with our other tracking types. The ML models tend to be more complex than face tracking ones as hands don’t have as many distinctive features, and have more potential configurations, which usually results in pretty slow frame-rates on mobile hardware as in your example.

Nice to see it working though, and an interesting experiment for sure :+1:

3 Likes

Hello Simon,
Thank you for your response. You guessed it correctly it is mediapipe and as you stated the frame rates drops alot due to the tracking methods.
As for Zappar I am not fidgeting anything with the in-built tracking types but using zappar for just the AR camera access (setting the scene background) while the tracking is handled by mediapipe and Three.js is used for rendering the 3D model. I could achieve the same results by accessing the camera directly and rendering the camera feed on the canvas but I wanted to try handtracking with Zappar and as Zappar has easy declaration of elements like in case of Three.js renderer, scene and camera the implementation is pretty similar to the Three.js implementation for mediapipe.

Wow this is amazing! Mind to share the tips how to develop? :smiley: I tried before but i kept failed:sweat:

If you can dm me I can guide you through all the resources and the way I choose to go through it.

Hello mihir is it possible to use this to put a model maybe like a ring on the ml model and keep track of that ring on the finger ?

We are working on the prototype for the same will let you know if we are successful but theoretically it is possible to do so you can integrate different ml models as well top achieve similar results. For your use case you will need specific target points to render image or 3D model of the ring on the finger I would recommend researching more on ml models for hand tracking which will provide you with specific points to render content on those points.

1 Like

Yeah, thank you is it possible to share with me the source code of this project?

and one more question like this ml model has some points like the index finger or the ring finger and those fingers has like multiple points, the camera is tracking those points and changing its vector3, is it possible to get the vector of a point of the ml models and put the ring on it so the ring keep changing position as the point.

I cannot provide the exact source code but I can surely provide you with some references to help you recreate the same project or similar to this project.

2 Likes

Yes please it would be great if you can give me source’s or references.
Thank you.

1 Like

Hi, @mihir, mediapipe only provides 3D landmarks. How can you get the hand model?

I used Three.js to render the 3D model to those points.

The xy coordinates of landmarks are screen coordinates and the depth is relative depth. How can you obtain the landmarks in 3D space?

I am using handpose model

<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/handpose@0.0.4/dist/handpose.min.js"></script>

which returns a hand object, each hand object contains a landmarks property which is an array of 21 3-D landmarks.