Find location in video matching a sentence with TAN
Find location in video matching a sentence with TAN Temporal Alignment Networks for Long-term Video arXiv paper abstract https://arxiv.org/abs/2204.02968 arXiv PDF paper https://arxiv.org/pdf/2204.02968.pdf The objective ... is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment. The challenge is to train such networks from
Apr-7-2022, 14:20:20 GMT
- Technology: