Not enough data to create a plot.
Try a different view from the menu above.
Bregler, Christoph
Catching Out-of-Context Misinformation with Self-supervised Learning
Aneja, Shivangi, Bregler, Christoph, Nießner, Matthias
Despite the recent attention to DeepFakes and other forms of image manipulations, one of the most prevalent ways to mislead audiences is the use of unaltered images in a new but false context. To address these challenges and support fact-checkers, we propose a new method that automatically detects out-of-context image and text pairs. Our core idea is a self-supervised training strategy where we only need images with matching (and non-matching) captions from different sources. At train time, our method learns to selectively align individual objects in an image with textual claims, without explicit supervision. At test time, we check for a given text pair if both texts correspond to same object(s) in the image but semantically convey different descriptions, which allows us to make fairly accurate out-of-context predictions. Our method achieves 82% out-of-context detection accuracy. To facilitate training our method, we created a large-scale dataset of 203,570 images which we match with 456,305 textual captions from a variety of news websites, blogs, and social media posts; i.e., for each image, we obtained several captions.
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
Tompson, Jonathan J., Jain, Arjun, LeCun, Yann, Bregler, Christoph
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques. Papers published at the Neural Information Processing Systems Conference.
Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
Tompson, Jonathan J., Jain, Arjun, LeCun, Yann, Bregler, Christoph
This paper proposes a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.
Pose-Sensitive Embedding by Nonlinear NCA Regression
Taylor, Graham W., Fergus, Rob, Williams, George, Spiro, Ian, Bregler, Christoph
This paper tackles the complex problem of visually matching people in similar pose but with different clothes, background, and other appearance changes. We achieve this with a novel method for learning a nonlinear embedding based on several extensions to the Neighborhood Component Analysis (NCA) framework. Our method is convolutional, enabling it to scale to realistically-sized images. By cheaply labeling the head and hands in large video databases through Amazon Mechanical Turk (a crowd-sourcing service), we can use the task of localizing the head and hands as a proxy for determining body pose. We apply our method to challenging real-world data and show that it can generalize beyond hand localization to infer a more general notion of body pose. We evaluate our method quantitatively against other embedding methods. We also demonstrate that real-world performance can be improved through the use of synthetic data.
Learning Motion Style Synthesis from Perceptual Observations
Torresani, Lorenzo, Hackney, Peggy, Bregler, Christoph
This paper presents an algorithm for synthesis of human motion in specified styles. We use a theory of movement observation (Laban Movement Analysis) to describe movement styles as points in a multidimensional perceptual space. We cast the task of learning to synthesize desired movement styles as a regression problem: sequences generated via space-time interpolation of motion capture data are used to learn a nonlinear mapping between animation parameters and movement styles in perceptual space. We demonstrate that the learned model can apply a variety of motion styles to prerecorded motion sequences and it can extrapolate styles not originally included in the training data.
Learning Motion Style Synthesis from Perceptual Observations
Torresani, Lorenzo, Hackney, Peggy, Bregler, Christoph
This paper presents an algorithm for synthesis of human motion in specified styles. We use a theory of movement observation (Laban Movement Analysis) to describe movement styles as points in a multidimensional perceptual space. We cast the task of learning to synthesize desired movement styles as a regression problem: sequences generated via space-time interpolation of motion capture data are used to learn a nonlinear mapping between animation parameters and movement styles in perceptual space. We demonstrate that the learned model can apply a variety of motion styles to prerecorded motion sequences and it can extrapolate styles not originally included in the training data.
Learning Non-Rigid 3D Shape from 2D Motion
Torresani, Lorenzo, Hertzmann, Aaron, Bregler, Christoph
This paper presents an algorithm for learning the time-varying shape of a nonrigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a nonrigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data.
Learning Non-Rigid 3D Shape from 2D Motion
Torresani, Lorenzo, Hertzmann, Aaron, Bregler, Christoph
This paper presents an algorithm for learning the time-varying shape of a nonrigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a nonrigid deformation. Reconstruction is ill-posed if arbitrary deformations areallowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data.
Learning Appearance Based Models: Mixtures of Second Moment Experts
Bregler, Christoph, Malik, Jitendra
This paper describes a new technique for object recognition based on learning appearance models. The image is decomposed into local regions which are described by a new texture representation called "Generalized Second Moments" thatare derived from the output of multiscale, multiorientation filter banks. Class-characteristic local texture features and their global composition is learned by a hierarchical mixture of experts architecture (Jordan & Jacobs). The technique is applied to a vehicle database consisting of 5 general car categories (Sedan, Van with backdoors, Van without backdoors, old Sedan, and Volkswagen Bug). This is a difficult problem with considerable in-class variation. The new technique has a 6.5% misclassification rate, compared to eigen-images which give 17.4% misclassification rate, and nearest neighbors which give 15 .7%
Learning Appearance Based Models: Mixtures of Second Moment Experts
Bregler, Christoph, Malik, Jitendra
This paper describes a new technique for object recognition based on learning appearance models. The image is decomposed into local regions which are described by a new texture representation called "Generalized Second Moments" that are derived from the output of multiscale, multiorientation filter banks. Class-characteristic local texture features and their global composition is learned by a hierarchical mixture of experts architecture (Jordan & Jacobs). The technique is applied to a vehicle database consisting of 5 general car categories (Sedan, Van with backdoors, Van without backdoors, old Sedan, and Volkswagen Bug). This is a difficult problem with considerable in-class variation.