Don't Look Twice: Faster Video Transformers with Run-Length Tokenization Rohan Choudhury

Neural Information Processing Systems 

RL T efficiently finds and removes'runs' of patches that are repeated over time prior to model inference, then replaces them with a single patch and a positional encoding to represent the resulting token's new length.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found