Video description: A comprehensive survey of deep learning approaches - Artificial Intelligence Review
Video description refers to understanding visual content and transforming that acquired understanding into automatic textual narration. Deep learning-based approaches employed for video description have demonstrated enhanced results compared to conventional approaches. The current literature lacks a thorough interpretation of the recently developed and employed sequence to sequence techniques for video description. This paper fills that gap by focusing mainly on deep learning-enabled approaches to automatic caption generation. Sequence to sequence models follow an Encoder–Decoder architecture employing a specific composition of CNN, RNN, or the variants LSTM or GRU as an encoder and decoder block.
Apr-11-2023, 16:55:51 GMT