Vision
TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations
Zemel, Richard S., Mozer, Michael C., Hinton, Geoffrey E.
We describe a model that can recognize two-dimensional shapes in an unsegmented image, independent of their orientation, position, and scale. The model, called TRAFFIC, efficiently represents the structural relation between an object and each of its component features by encoding the fixed viewpoint-invariant transformation from the feature's reference frame to the object's in the weights of a connectionist network. Using a hierarchy of such transformations, with increasing complexity of features at each successive layer, the network can recognize multiple objects in parallel. An implementation of TRAFFIC is described, along with experimental results demonstrating the network's ability to recognize constellations of stars in a viewpoint-invariant manner. 1 INTRODUCTION A key goal of machine vision is to recognize familiar objects in an unsegmented image, independent of their orientation, position, and scale. Massively parallel models have long been used for lower-level vision tasks, such as primitive feature extraction and stereo depth. Models addressing "higher-level" vision have generally been restricted to pattern matching types of problems, in which much of the inherent complexity of the domain has been eliminated or ignored.
TRAFFIC: Recognizing Objects Using Hierarchical Reference Frame Transformations
Zemel, Richard S., Mozer, Michael C., Hinton, Geoffrey E.
We describe a model that can recognize two-dimensional shapes in an unsegmented image, independent of their orientation, position, and scale. The model, called TRAFFIC, efficiently represents the structural relation between an object and each of its component features by encoding the fixed viewpoint-invariant transformation from the feature's reference frame to the object's in the weights of a connectionist network. Using a hierarchy of such transformations, with increasing complexity of features at each successive layer, the network can recognize multiple objects in parallel. An implementation ofTRAFFIC is described, along with experimental results demonstrating the network's ability to recognize constellations of stars in a viewpoint-invariant manner. 1 INTRODUCTION A key goal of machine vision is to recognize familiar objects in an unsegmented image, independent of their orientation, position, and scale. Massively parallel models have long been used for lower-level vision tasks, such as primitive feature extraction and stereo depth.
Real-Time Computer Vision and Robotics Using Analog VLSI Circuits
Koch, Christof, Bair, Wyeth, Harris, John G., Horiuchi, Timothy K., Hsu, Andrew, Luo, Jin
The long-term goal of our laboratory is the development of analog resistive network-based VLSI implementations of early and intermediate visionalgorithms. We demonstrate an experimental circuit for smoothing and segmenting noisy and sparse depth data using the resistive fuse and a 1-D edge-detection circuit for computing zero-crossingsusing two resistive grids with different spaceconstants. Todemonstrate the robustness of our algorithms and of the fabricated analog CMOS VLSI chips, we are mounting these circuits onto small mobile vehicles operating in a real-time, laboratory environment.
Real-Time Computer Vision and Robotics Using Analog VLSI Circuits
Koch, Christof, Bair, Wyeth, Harris, John G., Horiuchi, Timothy K., Hsu, Andrew, Luo, Jin
The long-term goal of our laboratory is the development of analog resistive network-based VLSI implementations of early and intermediate vision algorithms. We demonstrate an experimental circuit for smoothing and segmenting noisy and sparse depth data using the resistive fuse and a 1-D edge-detection circuit for computing zero-crossings using two resistive grids with different spaceconstants. To demonstrate the robustness of our algorithms and of the fabricated analog CMOS VLSI chips, we are mounting these circuits onto small mobile vehicles operating in a real-time, laboratory environment.
Neural Analog Diffusion-Enhancement Layer and Spatio-Temporal Grouping in Early Vision
Waxman, Allen M., Seibert, Michael, Cunningham, Robert K., Wu, Jian
A new class of neural network aimed at early visual processing is described; we call it a Neural Analog Diffusion-Enhancement Layer or "NADEL." The network consists of two levels which are coupled through feedfoward and shunted feedback connections. The lower level is a two-dimensional diffusion map which accepts visual features as input, and spreads activity over larger scales as a function of time. The upper layer is periodically fed the activity from the diffusion layer and locates local maxima in it (an extreme form of contrast enhancement) using a network of local comparators. These local maxima are fed back to the diffusion layer using an on-center/off-surround shunting anatomy. The maxima are also available as output of the network. The network dynamics serves to cluster features on multiple scales as a function of time, and can be used in a variety of early visual processing tasks such as: extraction of comers and high curvature points along edge contours, line end detection, gap filling in contours, generation of fixation points, perceptual grouping on multiple scales, correspondence and path impletion in long-range apparent motion, and building 2-D shape representations that are invariant to location, orientation, scale, and small deformation on the visual field.