A wide range of real-world applications, including computational photography (glint reflection) and augmented reality effects (virtual avatars) rely on accurately tracking the iris within an eye. This is a challenging task to solve on mobile devices, due to the limited computing resources, variable light conditions and the presence of occlusions, such as hair or people squinting. Iris tracking can also be utilized to determine the metric distance of the camera to the user. This can improve a variety of use cases, ranging from virtual try-on of properly sized glasses and hats to accessibility features that adopt the font size depending on the viewer's distance. Often, sophisticated specialized hardware is employed to compute the metric distance, limiting the range of devices on which the solution could be applied.
MediaPipe Objectron is a mobile real-time 3D object detection solution for everyday objects. It detects objects in 2D images, and estimates their poses through a machine learning (ML) model, trained on a newly created 3D dataset. Object detection is an extensively studied computer vision problem, but most of the research has focused on 2D object prediction. While 2D prediction only provides 2D bounding boxes, by extending prediction to 3D, one can capture an object's size, position and orientation in the world, leading to a variety of applications in robotics, self-driving vehicles, image retrieval, and augmented reality. Although 2D object detection is relatively mature and has been widely used in the industry, 3D object detection from 2D imagery is a challenging problem, due to the lack of data and diversity of appearances and shapes of objects within a category.
Face and hand tracking in the browser with MediaPipe and TensorFlow.js - Today we're excited to release two new packages: facemesh and handpose for tracking key landmarks on faces and hands respectively. This release has been a collaborative effort between the MediaPipe and TensorFlow.js Originally published by Ann Yuan and Andrey Vakunov, Software Engineers at Google at blog.tensorflow.org Today we're excited to release two new packages: facemesh and handpose for tracking key landmarks on faces and hands respectively. This release has been a collaborative effort between the MediaPipe and TensorFlow.js
The model is designed for front-facing cameras on mobile devices, where faces in view tend to occupy a relatively large fraction of the canvas. MediaPipe Facemesh may struggle to identify far-away faces. Check out our demo, which uses the model to detect facial landmarks in a live video stream. This model is also available as part of MediaPipe, a framework for building multimodal applied ML pipelines.
Google recently announced ways to blur and replace the background in Google Meet for better focus on the person rather than the surrounding. The new features are powered by cutting-edge web machine learning (ML) technologies built with MediaPipe that work directly in the browser, without any extra steps like installing additional software. One of the main motives for developing these features was to provide real-time, in-browser performance on almost all modern devices. It accomplishes this by combining efficient on-device ML models, WebGL-based rendering, and web-based ML inference via XNNPACK and TFLite. The new features of Meet are developed with MediaPipe, Google's open-source framework. It helps building multimodal (for example, video, audio, any time series data), cross-platform (i.e., Android, iOS, web, edge devices) applied ML pipelines.