Object-Oriented Architecture
Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
Malisiewicz, Tomasz, Efros, Alyosha
The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the objects relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearance-based model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralbas proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems.
Object Recognition by Scene Alignment
Russell, Bryan, Torralba, Antonio, Liu, Ce, Fergus, Rob, Freeman, William T.
Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, inan appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic modelto transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database.
A Computational Model of Eye Movements during Object Class Detection
Zhang, Wei, Yang, Hyejin, Samaras, Dimitris, Zelinsky, Gregory J.
We present a computational model of human eye movements in an object classdetection task. The model combines state-of-the-art computer vision object class detection methods (SIFT features trained using AdaBoost) witha biologically plausible model of human eye movement to produce a sequence of simulated fixations, culminating with the acquisition ofa target. We validated the model by comparing its behavior to the behavior of human observers performing the identical object class detection task (looking for a teddy bear among visually complex nontarget objects).We found considerable agreement between the model and human data in multiple eye movement measures, including number of fixations, cumulative probability of fixating the target, and scanpath distance.
Discriminating Deformable Shape Classes
Ruiz-correa, Salvador, Shapiro, Linda G., Meila, Marina, Berson, Gabriel
We present and empirically test a novel approach for categorizing 3-D free form object shapes represented by range data. In contrast to traditional surface-signature based systems that use alignment to match specific objects, we adapted the newly introduced symbolic-signature representation to classify deformable shapes [10]. Our approach constructs an abstract description of shape classes using an ensemble of classifiers that learn object class parts and their corresponding geometrical relationships from a set of numeric and symbolic descriptors. We used our classification engine in a series of large scale discrimination experiments on two well-defined classes that share many common distinctive features. The experimental results suggest that our method outperforms traditional numeric signature-based methodologies.
Discriminating Deformable Shape Classes
Ruiz-correa, Salvador, Shapiro, Linda G., Meila, Marina, Berson, Gabriel
We present and empirically test a novel approach for categorizing 3-D free form object shapesrepresented by range data . In contrast to traditional surface-signature based systems that use alignment to match specific objects, we adapted the newly introduced symbolic-signature representation to classify deformable shapes [10]. Our approach constructs anabstract description of shape classes using an ensemble of classifiers that learn object class parts and their corresponding geometrical relationships from a set of numeric and symbolic descriptors. We used our classification engine in a series of large scale discrimination experimentson two well-defined classes that share many common distinctive features. The experimental results suggest that our method outperforms traditional numeric signature-based methodologies.
Dynamic Vision-Based Intelligence
A synthesisof methods from cybernetics and AI yields a concept of intelligence for autonomous mobile systems that integrates closed-loop visual perception and goal-oriented action cycles using spatiotemporal models. In a layered architecture, systems dynamics methods with differential models prevail on the lower, data-intensive levels, but on higher levels, AI-type methods are used. Knowledge about the world is geared to classes of objects and subjects. Subjects are defined as objects with additional capabilities of sensing, data processing, decision making, and control application. Specialist processes for visual detection and efficient tracking of class members have been developed. On the upper levels, individual instantiations of these class members are analyzed jointly in the task context, yielding the situation for decision making. As an application, vertebrate-type vision for tasks in vehicle guidance in naturally perturbed environments was investigated with a distributed PC system. Experimental results with the test vehicle VAMORS are discussed.
Unrestricted Recognition of 3D Objects for Robotics Using Multilevel Triplet Invariants
Granlund, Gosta H., Moe, Anders
A method for unrestricted recognition of three-dimensional objects was developed. By unrestricted, we imply that the recognition will be done independently of object position, scale, orientation, and pose against a structured background. It does not assume any preceding segmentation or allow a reasonable degree of occlusion. The method uses a hierarchy of triplet feature invariants, which are at each level defined by a learning procedure. In the feedback learning procedure, percepts are mapped on system states corresponding to manipulation parameters of the object. The method uses a learning architecture with channel information representation. This article discusses how objects can be represented. We propose a structure to deal with object and contextual properties in a transparent manner.
Contextual Modulation of Target Saliency
In real-world scenes, intrinsic object information is often degraded due to occlusion, low contrast, and poor resolution. In such situations, the object recognition problem based on intrinsic object representations is ill-posed. A more comprehensive representation of an object should include contextual information [11,13]: Obj.
Contextual Modulation of Target Saliency
In real-world scenes, intrinsic object information is often degraded due to occlusion, low contrast, and poor resolution. In such situations, the object recognition problem based on intrinsic object representations is ill-posed. A more comprehensive representation of an object should include contextual information [11,13]: Obj.