TokenLearner: Adaptive Space-Time Tokenization for Videos Michael S. Ryoo

Neural Information Processing Systems 

This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in image frames.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found