Reviews: Unsupervised learning of object structure and dynamics from videos
–Neural Information Processing Systems
Originality: The main contribution of the paper is to propose a structured representation for video prediction models based on extracting keypoints from images. Models that extract keypoints from images had been proposed before, and here the authors propose an extension of those ideas to video. The paper also has experiments to empirically analyze this representation, which is often lacking in other video prediction papers, despite the fact that learning representations is one of the main motivations for video prediction. Clarity: The paper is well organized and clearly written. Quality and significance: The experiments are sound and properly assess some of the points made by the authors. I believe there are some issues/typos with the model formulation.
Neural Information Processing Systems
Jan-27-2025, 10:08:15 GMT
- Technology: