Vision
Vision-Based Navigation I: A navigation filter for fusing DTM/correspondence updates
Kupervasser, Oleg, Voronov, Vladimir
An algorithm for pose and motion estimation using corresponding features in images and a digital terrain map is proposed. Using a Digital Terrain (or Digital Elevation) Map (DTM/DEM) as a global reference enables recovering the absolute position and orientation of the camera. In order to do this, the DTM is used to formulate a constraint between corresponding features in two consecutive frames. The utilization of data is shown to improve the robustness and accuracy of the inertial navigation algorithm. Extended Kalman filter was used to combine results of inertial navigation algorithm and proposed vision-based navigation algorithm. The feasibility of this algorithms is established through numerical simulations.
Nonparametric Edge Detection in Speckled Imagery
Girรณn, Edwin, Frery, Alejandro C., Cribari-Neto, Francisco
We address the issue of edge detection in Synthetic Aperture Radar imagery. In particular, we propose nonparametric methods for edge detection, and numerically compare them to an alternative method that has been recently proposed in the literature. Our results show that some of the proposed methods display superior results and are computationally simpler than the existing method. An application to real (not simulated) data is presented and discussed.
Deep Lambertian Networks
Tang, Yichuan, Salakhutdinov, Ruslan, Hinton, Geoffrey
Visual perception is a challenging problem in part due to illumination variations. A possible solution is to first estimate an illumination invariant representation before using it for recognition. The object albedo and surface normals are examples of such representations. In this paper, we introduce a multilayer generative model where the latent variables include the albedo, surface normals, and the light source. Combining Deep Belief Nets with the Lambertian reflectance assumption, our model can learn good priors over the albedo from 2D images. Illumination variations can be explained by changing only the lighting latent variable in our model. By transferring learned knowledge from similar objects, albedo and surface normals estimation from a single image is possible in our model. Experiments demonstrate that our model is able to generalize as well as improve over standard baselines in one-shot face recognition.
Leaf vein segmentation using Odd Gabor filters and morphological operations
Leaf vein forms the basis of leaf characterization and classification. Different species have different leaf vein patterns. It is seen that leaf vein segmentation will help in maintaining a record of all the leaves according to their specific pattern of veins thus provide an effective way to retrieve and store information regarding various plant species in database as well as provide an effective means to characterize plants on the basis of leaf vein structure which is unique for every species. The algorithm proposes a new way of segmentation of leaf veins with the use of Odd Gabor filters and the use of morphological operations for producing a better output. The Odd Gabor filter gives an efficient output and is robust and scalable as compared with the existing techniques as it detects the fine fiber like veins present in leaves much more efficiently.
On multi-view feature learning
Sparse coding is a common approach to learning local features for object recognition. Recently, there has been an increasing interest in learning features from spatio-temporal, binocular, or other multi-observation data, where the goal is to encode the relationship between images rather than the content of a single image. We provide an analysis of multi-view feature learning, which shows that hidden variables encode transformations by detecting rotation angles in the eigenspaces shared among multiple image warps. Our analysis helps explain recent experimental results showing that transformation-specific features emerge when training complex cell models on videos. Our analysis also shows that transformation-invariant features can emerge as a by-product of learning representations of transformations.
Manifold Relevance Determination
Damianou, Andreas, Ek, Carl, Titsias, Michalis, Lawrence, Neil
In this paper we present a fully Bayesian latent variable model which exploits conditional nonlinear(in)-dependence structures to learn an efficient latent representation. The latent space is factorized to represent shared and private information from multiple views of the data. In contrast to previous approaches, we introduce a relaxation to the discrete segmentation and allow for a "softly" shared latent space. Further, Bayesian techniques allow us to automatically estimate the dimensionality of the latent spaces. The model is capable of capturing structure underlying extremely high dimensional spaces. This is illustrated by modelling unprocessed images with tenths of thousands of pixels. This also allows us to directly generate novel images from the trained model by sampling from the discovered latent spaces. We also demonstrate the model by prediction of human pose in an ambiguous setting. Our Bayesian framework allows us to perform disambiguation in a principled manner by including latent space priors which incorporate the dynamic nature of the data.
Modeling Latent Variable Uncertainty for Loss-based Learning
Kumar, M. Pawan, Packer, Ben, Koller, Daphne
We consider the problem of parameter estimation using weakly supervised datasets, where a training sample consists of the input and a partially specified annotation, which we refer to as the output. The missing information in the annotation is modeled using latent variables. Previous methods overburden a single distribution with two separate tasks: (i) modeling the uncertainty in the latent variables during training; and (ii) making accurate predictions for the output and the latent variables during testing. We propose a novel framework that separates the demands of the two tasks using two distributions: (i) a conditional distribution to model the uncertainty of the latent variables for a given input-output pair; and (ii) a delta distribution to predict the output and the latent variables for a given input. During learning, we encourage agreement between the two distributions by minimizing a loss-based dissimilarity coefficient. Our approach generalizes latent SVM in two important ways: (i) it models the uncertainty over latent variables instead of relying on a pointwise estimate; and (ii) it allows the use of loss functions that depend on latent variables, which greatly increases its applicability. We demonstrate the efficacy of our approach on two challenging problems---object detection and action detection---using publicly available datasets.
Analyzing Posture and Affect in Task-Oriented Tutoring
Grafsgaard, Joseph F. (North Carolina State University) | Boyer, Kristy Elizabeth (North Carolina State University) | Wiebe, Eric N. (North Carolina State University) | Lester, James C. (North Carolina State University)
Intelligent tutoring systems research aims to produce systems that meet or exceed the effectiveness of one-on-one expert human tutoring. Theory and empirical study suggest that affective states of the learner must be addressed to achieve this goal. While many affective measures can be utilized, posture offers the advantages of non-intrusiveness and ease of interpretation. This paper presents an accurate posture estimation algorithm applied to a computer-mediated tutoring corpus of depth recordings. Analyses of posture and session-level student reports of engagement and cognitive load identified significant patterns. The results indicate that disengagement and frustration may coincide with closer postural positions and more movement, while focused attention and less frustration occur with more distant, stable postural positions. It is hoped that this work will lead to intelligent tutoring systems that recognize a greater breadth of affective expression through channels of posture and gesture.
Hybrid Linear Modeling via Local Best-fit Flats
Zhang, Teng, Szlam, Arthur, Wang, Yi, Lerman, Gilad
We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $\beta_2$ numbers (we prove under certain geometric conditions that our method finds the optimal local neighborhoods). The collection of subspaces is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to tracking-based motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces.
Simultaneous Object Detection, Tracking, and Event Recognition
Barbu, Andrei, Michaux, Aaron, Narayanaswamy, Siddharth, Siskind, Jeffrey Mark
The common internal structure and algorithmic organization of object detection, detection-based tracking, and event recognition facilitates a general approach to integrating these three components. This supports multidirectional information flow between these components allowing object detection to influence tracking and event recognition and event recognition to influence tracking and object detection. The performance of the combination can exceed the performance of the components in isolation. This can be done with linear asymptotic complexity.