Goto

Collaborating Authors

Google's Objectron uses AI to track 3D objects in 2D video

#artificialintelligence

Coinciding with the kickoff of the 2020 TensorFlow Developer Summit, Google today published a pipeline -- Objectron -- that spots objects in 2D images and estimates their poses and sizes through an AI model. The company says it has implications for robotics, self-driving vehicles, image retrieval, and augmented reality -- for instance, it could help a factory floor robot avoid obstacles in real time. Tracking 3D objects is a tricky prospect, particularly when dealing with limited compute resources (like a smartphone system-on-chip). And it becomes tougher when the only imagery (usually video) available is 2D due to a lack of data and a diversity of appearances and shapes of objects. The Google team behind Objectron, then, developed a toolset that allowed annotators to label 3D bounding boxes (i.e., rectangular borders) for objects using a split-screen view to display 2D video frames.


Face and hand tracking in the browser with MediaPipe and TensorFlow.js

#artificialintelligence

Face and hand tracking in the browser with MediaPipe and TensorFlow.js - Today we're excited to release two new packages: facemesh and handpose for tracking key landmarks on faces and hands respectively. This release has been a collaborative effort between the MediaPipe and TensorFlow.js Originally published by Ann Yuan and Andrey Vakunov, Software Engineers at Google at blog.tensorflow.org Today we're excited to release two new packages: facemesh and handpose for tracking key landmarks on faces and hands respectively. This release has been a collaborative effort between the MediaPipe and TensorFlow.js


Applying Machine Learning on Mobile Devices

#artificialintelligence

In the modern world, machine learning is used in various fields: image classification, consumer demand forecasts, film and music recommendations for particular people, clustering. At the same time, for fairly large models, the result computation (and to a much greater degree the training of the model) can be a resource-intensive operation. In order to use the trained models on devices other than the most powerful ones, Google introduced its TensorFlow Lite framework. To work with it, you need to train a model built using the TensorFlow framework (not Lite!) and then convert it to the TensorFlow Lite format. After that, the model can be easily used on embedded or mobile devices.


Applying Machine Learning on Mobile Devices

#artificialintelligence

In the modern world, machine learning is used in various fields: image classification, consumer demand forecasts, film and music recommendations for particular people, clustering. At the same time, for fairly large models, the result computation (and to a much greater degree the training of the model) can be a resource-intensive operation. In order to use the trained models on devices other than the most powerful ones, Google introduced its TensorFlow Lite framework. To work with it, you need to train a model built using the TensorFlow framework (not Lite!) and then convert it to the TensorFlow Lite format. After that, the model can be easily used on embedded or mobile devices.


Mozilla updates DeepSpeech with an English language model that runs 'faster than real time'

#artificialintelligence

DeepSpeech, a suite of speech-to-text and text-to-speech engines maintained by Mozilla's Machine Learning Group, this morning received an update (to version 0.6) that incorporates one of the fastest open source speech recognition models to date. In a blog post, senior research engineer Reuben Morais lays out what's new and enhanced, as well as other spotlight features coming down the pipeline. The latest version of DeepSpeech adds support for TensorFlow Lite, a version of Google's TensorFlow machine learning framework that's optimized for compute-constrained mobile and embedded devices. It has reduced DeepSpeech's package size from 98MB to 3.7MB and its built-in English model size -- which has a 7.5% word error rate on a popular benchmark and which was trained on 5,516 hours of transcribed audio from WAMU (NPR), LibriSpeech, Fisher, Switchboard, and Mozilla's Common Voice English data sets -- from 188MB to 47MB. Plus, it has cut down DeepSpeech's memory consumption by 22 times and boosted its startup speed by over 500 times.