Goto

Collaborating Authors

 semi-supervised deep graphical model


Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking

Neural Information Processing Systems

Noninvasive behavioral tracking of animals is crucial for many scientific investigations. Recent transfer learning approaches for behavioral tracking have considerably advanced the state of the art. Typically these methods treat each video frame and each object to be tracked independently. In this work, we improve on these methods (particularly in the regime of few training labels) by leveraging the rich spatiotemporal structures pervasive in behavioral video --- specifically, the spatial statistics imposed by physical constraints (e.g., paw to elbow distance), and the temporal statistics imposed by smoothness from frame to frame. We propose a probabilistic graphical model built on top of deep neural networks, Deep Graph Pose (DGP), to leverage these useful spatial and temporal constraints, and develop an efficient structured variational approach to perform inference in this model. The resulting semi-supervised model exploits both labeled and unlabeled frames to achieve significantly more accurate and robust tracking while requiring users to label fewer training frames. In turn, these tracking improvements enhance performance on downstream applications, including robust unsupervised segmentation of behavioral disentangled'' low-dimensional representations of the full behavioral video.


Review for NeurIPS paper: Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking

Neural Information Processing Systems

Weaknesses: - How much does the spatial and temporal potentials matter? The paper conducts experiments on DLC semi (supervised gaussian regularization) and DGP, however the influence of spatial and temporal potentials are not evaluated independently. This seems like an informative ablation study to do, especially since the paper claims the difference with prior work is that prior work does not consider temporal and spatial priors. There is a recent work OptiFlex by Liu et al which also uses temporal information, this should be cited. Only copared against a fully supervised method (DLC) and a baseline semi-supervised method which is an ablative version of the proposed approach (no temporal and structural priors).


Review for NeurIPS paper: Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking

Neural Information Processing Systems

This submission proposes a method animal 2D pose estimation and tracking given limited amounts of ground truth annotations. It initially received four reviews with diverging scores (5,6,7,4), which remained unchanged after the rebuttal. The reviewers appreciated importance of the application, solid empirical performance compared to DeepLabCut (including tests on downstream tasks) and insightful analysis of the learned representations. At the same time, the main concerns of the reviewers were limited methodological novelty beyond applying known methods to the new domain of animal tracking, as well as limitations in the empirical studies. This case was further discussed between the AC and the SAC, who arrived to the conclusion that the merits of this submission in advancing animal tracking outweigh its limitations. The final recommendation is to accept as a poster.


Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking

Neural Information Processing Systems

Noninvasive behavioral tracking of animals is crucial for many scientific investigations. Recent transfer learning approaches for behavioral tracking have considerably advanced the state of the art. Typically these methods treat each video frame and each object to be tracked independently. In this work, we improve on these methods (particularly in the regime of few training labels) by leveraging the rich spatiotemporal structures pervasive in behavioral video --- specifically, the spatial statistics imposed by physical constraints (e.g., paw to elbow distance), and the temporal statistics imposed by smoothness from frame to frame. We propose a probabilistic graphical model built on top of deep neural networks, Deep Graph Pose (DGP), to leverage these useful spatial and temporal constraints, and develop an efficient structured variational approach to perform inference in this model.