Hand Gesture Recognition with MediaPipe
MediaPipe is a library developed by Google that allows features such as hands, body, and face to be tracked in real time using a simple webcam. The library uses machine learning to analyze the image and generate the specific coordinates of the tracked features.
To start using MediaPipe within Cables, it is necessary to load the “MediaPipe” extension containing the operators needed to interact with the library.
In this article we will cover the steps required to create an interactive application through hand tracking.
First, insert a WebcamTexture
node to access the webcam and connect the texture
output to a FullScreenRectangle
to display it in the canvas, set the Scale in the latter operator settings to stretch. At this point we will have our image displayed.
Now you need to send the webcam video stream to MediaPipe. Enter the operator MpHandTracking
and connect it to the CSS Element output of WebcamTexture
to receive the tracking coordinates of both hands. Then select one of the two hands through the MpHand
operator. In output we will have a "points" array inside which we will find the coordinates of all the tracking points of the selected hand. To extract the position of a specific point use the MpHandCoordinate
node and choose one of the available points within the "Joint" menu.
Before you begin to see anything, you need to link an element that you can move. Immediately afterFullScreenRectangle
connect a PixelProjection
operator. Cables uses a different coordinate system than Mediapipe, so we will adapt the coordinates of our canvas to those received from MediaPipe. In the settings of PixelProjection
set Size
to Manual
and enter a Width
and Height
of 2
. Then center the coordinate origin in "Position 0,0"
by setting it to Center
.
At this point connect a BasicMaterial
and a Circle
. The canvas now has the image of our webcam and a colored circle in the center of the screen. Let's set a radius
of 0.1
to make the circle smaller.
Most likely the circle will be “squashed”, this is because the proportions of the shape are directly related to those of the canvas, resizing the canvas will also change the perspective ratio of the circle. To prevent this problem you can link the canvas's Aspect Ratio to the circle's scale via the CanvasInfo
operator and Scale
.
Now we need to make the circle move. To do this we need to link the coordinates provided by to theMpHandCoordinate
through a Transform
. Connect the X and Y outputs of MpHandCoordinate
to the respective posX and posY in the Transform
node.
Play with MediaPipe
Once you become familiar with how MediaPipe works, the possibilities for artistic development are limitless.The key is to experiment. By modifying patches even slightly you can create new combinations all the time.
For example, we can further develop the patch just seen to add other types of interaction and elements. In this case instead of the circle we will connect a letter of the alphabet to the position of the index finger and thumb. By extending and bringing the fingers closer together we will increase the size of the element and change the letter, starting from A when the fingers touch, to Z at maximum extension.
Although it may seem complex at first glance, it is only a small step up from the previous patch. First we need to add a MpHandCoordinate
to control the thumb position as well. Replace the Circle
with a TextMesh
for now leaving it on the letter “A”.
Our “A,” however, will continue to follow the index finger to which it was previously connected. We need to find the midpoint between the two fingers. To do this we need to average the respective coordinates. We will then have :
"posX = (X(index) + X(thumb))/2 , posY = (Y(index) + Y(thumb))/2"
Now our letter will be perfectly centered between the two fingers. The next step is to scale it relatively to the distance between the two fingers. Fortunately Cables provides a Distance2D
operator. By connecting the X and Y coordinates of both points we will output the distance between the two. We then connect the output of Distance2D
to the Scale
input in theTransform
node.
Bringing the index finger and thumb closer together will scale the letter accordingly. Now we need to link the distance between the fingers to the letters of the alphabet. Conceptually we should create an array containing all 26 letters (in the case of the English alphabet, but any type of alphabet can be used), and map the distance between the two fingers to the index finger in the array, so from 0 to 25.
First add a StringToArray
operator, in the input text
enter all the letters of the alphabet, one per line. In the operator settings, uncheck Numbers
and turn on Split Lines
. Next, add a ArrayGetString
to select a single letter by index and link it to the input text
of the TextMesh
operator.
All that remains at this point is to link the Distance2D
operator to the input index
of ArrayGetString
. , but to do this we need to remap the output of the first node. The value we will receive from Distance2D
travels in a range between about 0 and 1, while the index of the letter array is between 0 and 25. We can use the operator MapRange
and set it with the following values oldMin = 0.1
oldMax = 0.7
newMin=0
newMax=25
. Then add an operator Round
to round the decimal value to an integer (The index of an array is always an integer).