HENASY: Learning to Assemble Scene-Entities for Interpretable Egocentric Video-Language Model

Neural Information Processing Systems 

In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found