motion-augmented rgb stream
MARS: Motion-Augmented RGB Stream for Action Recognition - Naver Labs Europe
This blog presents our CVPR'19 paper on "MARS: Motion-Augmented RGB Stream for Action Recognition" done with the Thoth team at Inria. The code and trained models are available here. Action recognition in videos means you need to process both spatial and temporal information and, although CNNs have been pretty successful in modeling spatial information, their performance in modeling temporal information has been subpar. Current state-of-the-art techniques use 3D CNN based two stream architectures that are trained on a large dataset and where one stream processes appearance information using RGB frames while the other deals with motion information using optical flow. However, computing optical flows creates a latency for recognizing videos which obviously limits its use in real-time applications.