00482b9bed15a272730fcb590ffebddd-Supplemental.pdf

Apr-30-2026, 19:37:13 GMT–Neural Information Processing Systems

A.1 Training dataset pre-processing We used 40000publicly available videos from YouTube which were available in a spatial resolution of at least 1920 1080 pixels. In an attempt not to skew the distribution of content too far from what may inform biological representation learning, we excluded most artificial content such as screenshots and videos of computer games. To reduce video compression artifacts and prevent systematic downsampling artifacts, each segment was then spatially downsampled to a randomized height between 128 and 160. Each segment was then separated into 15 pairs of neighboring frames, and a randomly placed, but spatially colocated patch of 64 64 pixels was cropped out of each frame pair. The order of the frame pairs was then randomized in a running buffer, and all RGB pixel values were normalized to the range between 0 and 1 before being fed into the model.

artificial intelligence, corruption, machine learning, (15 more...)

Neural Information Processing Systems

Apr-30-2026, 19:37:13 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.91)

Duplicate Docs Excel Report

Title
00482b9bed15a272730fcb590ffebddd-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found