Sports Video Classification from Multimodal Information Using Deep Neural Networks
Sachan, Devendra Singh (Indian Institute of Technology, Guwahati) | Tekwani, Umesh (Indian Institute of Technology, Guwahati) | Sethi, Amit (Indian Institute of Technology, Guwahati)
The work presents a methodology for classification of sports videos using both audio and visual information by applying deep learning algorithms. We show a methodology to combine multiple deep learning architectures through higher layers. Our method learns two separate models trained on audio and visual part of the data. We have trained the model for the audio part of the multimedia input using two stacked layers of CRBMs forminga CDBN. We also train two layered ISA network to extract features from video part of the data. We then train deep stacked autoencoder over both audio and visual features with discriminative fine tuning. Our results show that by combining both audio and visual features we get better accuracy as compared to single type of features.
Nov-14-2013