Describing Visual Scenes using Transformed Dirichlet Processes
Torralba, Antonio, Willsky, Alan S., Sudderth, Erik B., Freeman, William T.
–Neural Information Processing Systems
Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach explicitly captures uncertainty in the number of object instances depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DPin which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object-centered coordinate frame, while transformations model the object positionsin a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeledstreet scenes, we show that the TDP's inclusion of spatial structure improves detection performance, flexibly exploiting partially labeled training images.
Neural Information Processing Systems
Dec-31-2006