Country
Using Expectation to Guide Processing: A Study of Three Real-World Applications
In many real world tasks, only a small fraction of the available inputs are important at any particular time. This paper presents a method for ascertaining the relevance of inputs by exploiting temporal coherence and predictability. The method proposed in this paper dynamically allocates relevance to inputs by using expectations of their future values. As a model of the task is learned, the model is simultaneously extended to create task-specific predictions of the future values of inputs. Inputs which are either not relevant, and therefore not accounted for in the model, or those which contain noise, will not be predicted accurately. These inputs can be de-emphasized, and, in turn, a new, improved, model of the task created. The techniques presented in this paper have yielded significant improvements for the vision-based autonomous control of a land vehicle, vision-based hand tracking in cluttered scenes, and the detection of faults in the etching of semiconductor wafers.
Multiresolution Tangent Distance for Affine-invariant Classification
Vasconcelos, Nuno, Lippman, Andrew
The ability to rely on similarity metrics invariant to image transformations is an important issue for image classification tasks such as face or character recognition. We analyze an invariant metric that has performed well for the latter - the tangent distance - and study its limitations when applied to regular images, showing that the most significant among these (convergence to local minima) can be drastically reduced by computing the distance in a multiresolution setting. This leads to the multi resolution tangent distance, which exhibits significantly higher invariance to image transformations, and can be easily combined with robust estimation procedures.
Self-similarity Properties of Natural Images
Turiel, Antonio, Mato, Germรกn, Parga, Nรฉstor, Nadal, Jean-Pierre
Scale invariance is a fundamental property of ensembles of natural images [1]. Their non Gaussian properties [15, 16] are less well understood, but they indicate the existence of a rich statistical structure. In this work we present a detailed study of the marginal statistics of a variable related to the edges in the images. A numerical analysis shows that it exhibits extended self-similarity [3, 4, 5]. This is a scaling property stronger than self-similarity: all its moments can be expressed as a power of any given moment. More interesting, all the exponents can be predicted in terms of a multiplicative log-Poisson process. This is the very same model that was used very recently to predict the correct exponents of the structure functions of turbulent flows [6]. These results allow us to study the underlying multifractal singularities. In particular we find that the most singular structures are one-dimensional: the most singular manifold consists of sharp edges.
2D Observers for Human 3D Object Recognition?
Further, the greater the similarity between objects, the stronger is the dependence on object appearance, and the more important twodimensional (2D) image information becomes. These findings, however, do not rule out the use of 3D structural information in recognition, and the degree to which 3D information is used in visual memory is an important issue. Liu, Knill, & Kersten (1995) showed that any model that is restricted to rotations in the image plane of independent 2D templates could not account for human performance in discriminating novel object views. We now present results from models of generalized radial basis functions (GRBF), 2D nearest neighbor matching that allows 2D affine transformations, and a Bayesian statistical estimator that integrates over all possible 2D affine transformations. The performance of the human observers relative to each of the models is better for the novel views than for the familiar template views, suggesting that humans generalize better to novel views from template views. The Bayesian estimator yields the optimal performance with 2D affine transformations and independent 2D templates. Therefore, models of 2D affine matching operations with independent 2D templates are unlikely to account for human recognition performance.
Visual Navigation in a Robot Using Zig-Zag Behavior
We implement a model of obstacle avoidance in flying insects on a small, monocular robot. The result is a system that is capable of rapid navigation through a dense obstacle field. The key to the system is the use of zigzag behavior to articulate the body during movement. It is shown that this behavior compensates for a parallax blind spot surrounding the focus of expansion normally found in systems without parallax behavior.
Detection of First and Second Order Motion
Grunewald, Alexander, Neumann, Heiko
A model of motion detection is presented. The model contains three stages. The first stage is unoriented and is selective for contrast polarities. The next two stages work in parallel. A phase insensitive stage pools across different contrast polarities through a spatiotemporal filter and thus can detect first and second order motion.
Recovering Perspective Pose with a Dual Step EM Algorithm
Cross, Andrew D. J., Hancock, Edwin R.
This paper describes a new approach to extracting 3D perspective structure from 2D point-sets. The novel feature is to unify the tasks of estimating transformation geometry and identifying pointcorrespondence matches. Unification is realised by constructing a mixture model over the bipartite graph representing the correspondence match and by effecting optimisation using the EM algorithm. According to our EM framework the probabilities of structural correspondence gate contributions to the expected likelihood function used to estimate maximum likelihood perspective pose parameters. This provides a means of rejecting structural outliers.
A Non-Parametric Multi-Scale Statistical Model for Natural Images
Bonet, Jeremy S. De, Viola, Paul A.
The observed distribution of natural images is far from uniform. On the contrary, real images have complex and important structure that can be exploited for image processing, recognition and analysis. There have been many proposed approaches to the principled statistical modeling of images, but each has been limited in either the complexity of the models or the complexity of the images. We present a nonparametric multi-scale statistical model for images that can be used for recognition, image de-noising, and in a "generative mode" to synthesize high quality textures.