31fb284a0aaaad837d2930a610cd5e50-Supplemental-Conference.pdf
–Neural Information Processing Systems
In our work, we study the video-language pretraining in a specific yet significant domain - the 1st-person view,which ismotivated bytherelease oftheEgo4D dataset. Thevarying clipfrequencies aremainly dependent on manual narrations that are annotated based on the video scenarios and activities. There have average 13.4 clips per minute of video, maximize to175.8 Fig.6(b)displays the distribution of clip duration. In Figure 1 (c), we present the distribution of narration words length.
Neural Information Processing Systems
Feb-8-2026, 05:24:11 GMT
- Country:
- Africa > Rwanda (0.05)
- Asia
- Europe > Italy (0.05)
- North America > United States
- South America > Colombia (0.05)
- Technology: