Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video
Pigou, Lionel, Oord, Aäron van den, Dieleman, Sander, Van Herreweghe, Mieke, Dambre, Joni
–arXiv.org Artificial Intelligence
Gesture recognition is one of the core components in the thriving research field of humancomputer interaction. The recognition of distinct hand and arm motions is becoming increasingly important, as it enables smart interactions with electronic devices. Furthermore, gesture identification in video can be seen as a first step towards sign language recognition, where even subtle differences in motion can play an important role. Some examples that complicate the identification of gestures are changes in background and lighting due to the varying environment, variations in the performance and speed of the gestures, different clothes worn by the performers and different positioning relative to the camera. Moreover, regular hand motion or out-of-vocabulary gestures should not to be confused with one of the target gestures. Convolutional neural networks (CNNs) (LeCun et al., 1998) are the de facto standard approach in computer vision. CNNs have the ability to learn complex hierarchies with increasing levels of abstraction while being end-to-end trainable. Their success has had a huge impact on vision based applications like image classification (Krizhevsky et al., 2012), object detection
arXiv.org Artificial Intelligence
Feb-10-2016