TensorFlow recently launched its latest pose detection model, MoveNet, with a new pose-detection API in TensorFlow.js. MoveNet is a very fast and accurate model that detects 17 keypoints of a body. The model is offered with two variants, called Lightning and Thunder. Both the models run faster than real time (i.e.,30 FramesPerSecond) on most modern desktops, laptops, and phones. The models run completely on client-side, in the browser using TensorFlow.js Human pose estimation has developed a lot; however, it hasn't surfaced in many applications, mainly because more focus has been placed on making pose models larger and more accurate than making them faster and easily deployable everywhere.
TensorFlow Lite is an open source deep learning framework for on-device inference. Therefore, we need to convert our trained .pb TensorFlow is an end-to-end open source platform for machine learning. Moreover, It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in Machine Learning. In addition, it helps developers easily build and deploy Machine Learning powered applications.
Ahead of Google I/O, Google Research launched a new pose detection model in TensorFlow.js called MoveNet. This ultra-fast and accurate model can detect 17 key points in the human body. MoveNet is currently available on TF Hub with two variants -- Lightning and Thunder. While Lightning is intended for latency-critical applications, Thunder is for applications that call for higher accuracy. Both models claim to run faster than real-time (30 frames per second (FPS)) on most personal computers, laptops and phones.
Every nine minutes a person is diagnosed with Parkinson's Disease (PD) in the United States. However, studies have shown that between 25 and 80\% of individuals with Parkinson's Disease (PD) remain undiagnosed. An online, in the wild audio recording application has the potential to help screen for the disease if risk can be accurately assessed. In this paper, we collect data from 726 unique subjects (262 PD and 464 Non-PD) uttering the "quick brown fox jumps over the lazy dog ...." to conduct automated PD assessment. We extracted both standard acoustic features and deep learning based embedding features from the speech data and trained several machine learning algorithms on them. Our models achieved 0.75 AUC by modeling the standard acoustic features through the XGBoost model. We also provide explanation behind our model's decision and show that it is focusing mostly on the widely used MFCC features and a subset of dysphonia features previously used for detecting PD from verbal phonation task.
The effects of adding pitch and voice quality features such as jitter and shimmer to a state-of-the-art CNN model for Automatic Speech Recognition are studied in this work. Pitch features have been previously used for improving classical HMM and DNN baselines, while jitter and shimmer parameters have proven to be useful for tasks like speaker or emotion recognition. Up to our knowledge, this is the first work combining such pitch and voice quality features with modern convolutional architectures, showing improvements up to 2% absolute WER points, for the publicly available Spanish Common Voice dataset. Particularly, our work combines these features with mel-frequency spectral coefficients (MFSCs) to train a convolutional architecture with Gated Linear Units (Conv GLUs). Such models have shown to yield small word error rates, while being very suitable for parallel processing for online streaming recognition use cases. We have added pitch and voice quality functionality to Facebook's wav2letter speech recognition framework, and we provide with such code and recipes to the community, to carry on with further experiments. Besides, to the best of our knowledge, our Spanish Common Voice recipe is the first public Spanish recipe for wav2letter.