Pushing the Envelope for Depth-Based Semi-Supervised 3D Hand Pose Estimation with Consistency Training

Rezaei, Mohammad, Farahanipad, Farnaz, Dillhoff, Alex, Athitsos, Vassilis

arXiv.org Artificial Intelligence 

The hands are the primary means by which humans interact with the outside world. As such, accurate hand pose estimation is a necessary requirement for many vision-based systems and enables many applications in areas such as augmented reality (AR), virtual reality (VR) and gesture recognition. Recently, the availability of more accurate and affordable commodity depth cameras coupled with the success of Deep Neural Networks (DNN) has led to significant progress in depth-based 3D hand pose estimation and segmentation [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. Despite these advancements, one major challenge that remains is that DNN-based methods require large amounts of annotated training data to realize their full potential. A straightforward approach to mitigate this requirement is to use synthesized training data with accurate annotations [11], which can be generated with minimal human effort. However, models trained on synthesized data generalize poorly to the real-world data due to the significant domain gap between synthetic and real-world data. A popular alternative is semi-supervised learning (SSL) [12], where the goal is to leverage unlabeled data along with the labeled data, hence reducing the amount of labeled data required for training. Most of the recent advancements of SSL methods have been focused on image classification [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found