Goto

Collaborating Authors

Announcing the Objectron Dataset

#artificialintelligence

The state of the art in machine learning (ML) has achieved exceptional accuracy on many computer vision tasks solely by training models on photos. Building upon these successes and advancing 3D object understanding has great potential to power a wider range of applications, such as augmented reality, robotics, autonomy, and image retrieval. For example, earlier this year we released MediaPipe Objectron, a set of real-time 3D object detection models designed for mobile devices, which were trained on a fully annotated, real-world 3D dataset, that can predict objects' 3D bounding boxes. Yet, understanding objects in 3D remains a challenging task due to the lack of large real-world datasets compared to 2D tasks (e.g., ImageNet, COCO, and Open Images). To empower the research community for continued advancement in 3D object understanding, there is a strong need for the release of object-centric video datasets, which capture more of the 3D structure of an object, while matching the data format used for many vision tasks (i.e., video or camera streams), to aid in the training and benchmarking of machine learning models.


Google's Objectron uses AI to track 3D objects in 2D video

#artificialintelligence

Coinciding with the kickoff of the 2020 TensorFlow Developer Summit, Google today published a pipeline -- Objectron -- that spots objects in 2D images and estimates their poses and sizes through an AI model. The company says it has implications for robotics, self-driving vehicles, image retrieval, and augmented reality -- for instance, it could help a factory floor robot avoid obstacles in real time. Tracking 3D objects is a tricky prospect, particularly when dealing with limited compute resources (like a smartphone system-on-chip). And it becomes tougher when the only imagery (usually video) available is 2D due to a lack of data and a diversity of appearances and shapes of objects. The Google team behind Objectron, then, developed a toolset that allowed annotators to label 3D bounding boxes (i.e., rectangular borders) for objects using a split-screen view to display 2D video frames.


Google claims TensorFlow's OpenCL can double inference performance

#artificialintelligence

Google today announced the launch of an OpenCL-based mobile GPU inference engine for its TensorFlow framework on Android. It's available now in the latest version of the TensorFlow Lite library, and the company claims it offers a two times speedup over the existing OpenGL backend with "reasonably-sized" AI models. OpenGL, which is nearly three decades old, is a platform-agnostic API for rendering 2D and 3D vector graphics. Compute shaders were added with OpenGL ES 3.1, but the TensorFlow team says backward-compatible design decisions limited them from reaching device GPUs' full potential. On the other hand, OpenCL was designed for computation with various accelerators from the beginning, and was thus more relevant to the domain of mobile GPU inference.


Applying Machine Learning on Mobile Devices

#artificialintelligence

In the modern world, machine learning is used in various fields: image classification, consumer demand forecasts, film and music recommendations for particular people, clustering. At the same time, for fairly large models, the result computation (and to a much greater degree the training of the model) can be a resource-intensive operation. In order to use the trained models on devices other than the most powerful ones, Google introduced its TensorFlow Lite framework. To work with it, you need to train a model built using the TensorFlow framework (not Lite!) and then convert it to the TensorFlow Lite format. After that, the model can be easily used on embedded or mobile devices.


Face and hand tracking in the browser with MediaPipe and TensorFlow.js

#artificialintelligence

Face and hand tracking in the browser with MediaPipe and TensorFlow.js - Today we're excited to release two new packages: facemesh and handpose for tracking key landmarks on faces and hands respectively. This release has been a collaborative effort between the MediaPipe and TensorFlow.js Originally published by Ann Yuan and Andrey Vakunov, Software Engineers at Google at blog.tensorflow.org Today we're excited to release two new packages: facemesh and handpose for tracking key landmarks on faces and hands respectively. This release has been a collaborative effort between the MediaPipe and TensorFlow.js