Vision
Visual Saliency Map from Tensor Analysis
Li, Bing (Chinese Academy of Sciences) | Xiong, Weihua (Omnivision Corporation) | Hu, Weiming (Chinese Academy of Sciences)
Modeling visual saliency map of an image provides important information for image semantic understanding in many applications. Most existing computational visual saliency models follow a bottom-up framework that generates independent saliency map in each selected visual feature space and combines them in a proper way. Two big challenges to be addressed explicitly in these methods are (1) which features should be extracted for all pixels of the input image and (2) how to dynamically determine importance of the saliency map generated in each feature space. In order to address these problems, we present a novel saliency map computational model based on tensor decomposition and reconstruction. Tensor representation and analysis not only explicitly represent image's color values but also imply two important relationships inherent to color image. One is reflecting spatial correlations between pixels and the other one is representing interplay between color channels. Therefore, saliency map generator based on the proposed model can adaptively find the most suitable features and their combinational coefficients for each pixel. Experiments on a synthetic image set and a real image set show that our method is superior or comparable to other prevailing saliency map models.
Model Learning and Real-Time Tracking Using Multi-Resolution Surfel Maps
Stückler, Jörg (University of Bonn) | Behnke, Sven (University of Bonn)
For interaction with its environment, a robot is required to learn models of objects and to perceive these models in the livestreams from its sensors. In this paper, we propose a novel approach to model learning and real-time tracking. We extract multi-resolution 3D shape and texture representations from RGB-D images at high frame-rates. An efficient variant of the iterative closest points algorithm allows for registering maps in real-time on a CPU. Our approach learns full-view models of objects in a probabilistic optimization framework in which we find the best alignment between multiple views. Finally, we track the pose of the camera with respect to the learned model by registering the current sensor view to the model. We evaluate our approach on RGB-D benchmarks and demonstrate its accuracy, efficiency, and robustness in model learning and tracking. We also report on the successful public demonstration of our approach in a mobile manipulation task.
Automatic Targetless Extrinsic Calibration of a 3D Lidar and Camera by Maximizing Mutual Information
Pandey, Gaurav (University of Michigan) | McBride, James R. (Ford Motor Company) | Savarese, Silvio (University of Michigan) | Eustice, Ryan M. (University of Michigan)
This paper reports on a mutual information (MI) based algorithm for automatic extrinsic calibration of a 3D laser scanner and optical camera system. By using MI as the registration criterion, our method is able to work in situ without the need for any specific calibration targets, which makes it practical for in-field calibration. The calibration parameters are estimated by maximizing the mutual information obtained between the sensor-measured surface intensities. We calculate the Cramer-Rao-Lower-Bound (CRLB) and show that the sample variance of the estimated parameters empirically approaches the CRLB for a sufficient number of views. Furthermore, we compare the calibration results to independent ground-truth and observe that the mean error also empirically approaches to zero as the number of views are increased. This indicates that the proposed algorithm, in the limiting case, calculates a minimum variance unbiased (MVUB) estimate of the calibration parameters. Experimental results are presented for data collected by a vehicle mounted with a 3D laser scanner and an omnidirectional camera system.
Coupling Spatiotemporal Disease Modeling with Diagnosis
Mubangizi, Martin Gordon (Makerere University) | Ikae, Caterine (Makerere University) | Spiliopoulou, Athina (University of Edinburgh) | Quinn, John A. (Makerere University)
Modelling the density of an infectious disease in space and time is a task generally carried out separately from the diagnosis of that disease in individuals. These two inference problems are complementary, however: diagnosis of disease can be done more accurately if prior information from a spatial risk model is employed, and in turn a disease density model can benefit from the incorporation of rich symptomatic information rather than simple counts of presumed cases of infection. We propose a unifying framework for both of these tasks, and illustrate it with the case of malaria. To do this we first introduce a state space model of malaria spread, and secondly a computer vision based system for detecting plasmodium in microscopical blood smear images, which can be run on location-aware mobile devices. We demonstrate the tractability of combining both elements and the improvement in accuracy this brings about.
Mirror Perspective-Taking with a Humanoid Robot
Hart, Justin Wildrick (Yale University) | Scassellati, Brian ( Yale University )
The ability to use a mirror as an instrument for spatial reasoning enables an agent to make meaningful inferences about the positions of objects in space based on the appearance of their reflections in mirrors. The model presented in this paper enables a robot to infer the perspective from which objects reflected in a mirror appear to be observed, allowing the robot to use this perspective as a virtual camera. Prior work by our group presented an architecture through which a robot learns the spatial relationship between its body and visual sense, mimicking an early form of self-knowledge in which infants learn about their bodies and senses through their interactions with each other. In this work, this self-knowledge is utilized in order to determine the mirror's perspective. Witnessing the position of its end-effector in a mirror in several distinct poses, the robot determines a perspective that is consistent with these observations. The system is evaluated by measuring how well the robot's predictions of its end-effector's position in 3D, relative to the robot's egocentric coordinate system, and in 2D, as projected onto it's cameras, match measurements of a marker tracked by its stereo vision system. Reconstructions of the 3D position end-effector, as computed from the perspective of the mirror, are found to agree with the forward kinematic model within a mean of 31.55mm. When observed directly by the robot's cameras, reconstructions agree within 5.12mm. Predictions of the 2D position of the end-effector in the visual field agree with visual measurements within a mean of 18.47 pixels, when observed in the mirror, or 5.66 pixels, when observed directly by the robot's cameras.
Sequence Labeling with Non-Negative Weighted Higher Order Features
Qian, Xian (University of Texas at Dallas) | Liu, Yang (University of Texas at Dallas)
In sequence labeling, using higher order features leads to high inference complexity. A lot of studies have been conducted to address this problem. In this paper, we propose a new exact decoding algorithm under the assumption that weights of all higher order features are non-negative. In the worst case, the time complexity of our algorithm is quadratic on the number of higher order features. Comparing with existing algorithms, our method is more efficient and easier to implement. We evaluate our method on two sequence labeling tasks: Optical Character Recognition and Chinese part-of-speech tagging. Our experimental results demonstrate that adding higher order features significantly improves the performance while requiring only 30% additional inference time.
Relative Attributes for Enhanced Human-Machine Communication
Parikh, Devi (Toyota Technological Institute Chicago) | Kovashka, Adriana (University of Texas at Austin) | Parkash, Amar (IIIT-Delhi) | Grauman, Kristen (University of Texas at Austin)
We propose to model relative attributes that capture the relationships between images and objects in terms of human-nameable visual properties. For example, the models can capture that animal A is 'furrier' than animal B, or image X is 'brighter' than image B. Given training data stating how object/scene categories relate according to different attributes, we learn a ranking function per attribute. The learned ranking functions predict the relative strength of each property in novel images. We show how these relative attribute predictions enable a variety of novel applications, including zero-shot learning from relative comparisons, automatic image description, image search with interactive feedback, and active learning of discriminative classifiers. We overview results demonstrating these applications with images of faces and natural scenes. Overall, we find that relative attributes enhance the precision of communication between humans and computer vision algorithms, providing the richer language needed to fluidly "teach" a system about visual concepts.
Towards Action Representation within the Framework of Conceptual Spaces: Preliminary Results
Beyer, Oliver (CITEC Bielefeld University) | Griffiths, Sascha (CITEC, Bielefeld University) | Cimiano, Philipp (CITEC, Bielefeld University)
We propose an approach for the representation of actions based on the conceptual spaces framework developed by Gärdenfors (2004). Action categories are regarded as properties in the sense of Gärdenfors (2011) and are understood as convex regions in action space. Action categories are mainly described by a force signature that represents the forces that act upon a main trajector involved in the action. This force signature is approximated via a representation that specifies the time-indexed position of the trajector relative to several landmarks. We also present a computational approach to extract such representations from video data. We present results on the Motionese dataset consisting of videos of parents demonstrating actions on objects to their children. We evaluate the representations on a clustering and a classification task showing that, while our representations seems to be reasonable, only a handful of actions can be discriminated reliably.
Crowd-Sourcing Design: Sketch Minimization using Crowds for Feedback
Engel, David (Massachusetts Institute of Technology) | Kottler, Verena (Max Planck Institute for Developmental Biology) | Malisi, Christoph (Max Planck Institute for Developmental Biology) | Roettig, Marc (University of Tuebingen) | Willing, Eva-Maria (Max Planck Institute for Plant-Breeding Research) | Schultheiss, Sebastian (Computonics.com)
Design tasks are notoriously difficult, because success is defined by the perception of the target audience, whose feedback is usually not available during design stages. Commonly, design is performed by professionals who have specific domain knowledge (i.e., an intuitive understanding of the implicit requirements of the task) and do not need the feedback of the perception of the viewers during the process. In this paper, we present a novel design methodology for creating minimal sketches of objects that uses an iterative optimization scheme. We define minimality for a sketch via the minimal number of straight line segments required for correct recognition by 75% of naiive viewers. Crowd-sourcing techniques allow us to directly include the perception of the audience in the design process. By joining designers and crowds, we are able to create a human computation system that can efficiently optimize sketches without requiring high levels of domain knowledge (i.e., design skills) from any worker.
Hypothesis Testing in Speckled Data with Stochastic Distances
Nascimento, Abraão D. C., Cintra, Renato J., Frery, Alejandro C.
Images obtained with coherent illumination, as is the case of sonar, ultrasound-B, laser and Synthetic Aperture Radar -- SAR, are affected by speckle noise which reduces the ability to extract information from the data. Specialized techniques are required to deal with such imagery, which has been modeled by the G0 distribution and under which regions with different degrees of roughness and mean brightness can be characterized by two parameters; a third parameter, the number of looks, is related to the overall signal-to-noise ratio. Assessing distances between samples is an important step in image analysis; they provide grounds of the separability and, therefore, of the performance of classification procedures. This work derives and compares eight stochastic distances and assesses the performance of hypothesis tests that employ them and maximum likelihood estimation. We conclude that tests based on the triangular distance have the closest empirical size to the theoretical one, while those based on the arithmetic-geometric distances have the best power. Since the power of tests based on the triangular distance is close to optimum, we conclude that the safest choice is using this distance for hypothesis testing, even when compared with classical distances as Kullback-Leibler and Bhattacharyya.