Goto

Collaborating Authors

 apparent motion


BuFF: Burst Feature Finder for Light-Constrained 3D Reconstruction

Ravendran, Ahalya, Bryson, Mitch, Dansereau, Donald G.

arXiv.org Artificial Intelligence

Robots operating at night using conventional vision cameras face significant challenges in reconstruction due to noise-limited images. Previous work has demonstrated that burst-imaging techniques can be used to partially overcome this issue. In this paper, we develop a novel feature detector that operates directly on image bursts that enhances vision-based reconstruction under extremely low-light conditions. Our approach finds keypoints with well-defined scale and apparent motion within each burst by jointly searching in a multi-scale and multi-motion space. Because we describe these features at a stage where the images have higher signal-to-noise ratio, the detected features are more accurate than the state-of-the-art on conventional noisy images and burst-merged images and exhibit high precision, recall, and matching performance. We show improved feature performance and camera pose estimates and demonstrate improved structure-from-motion performance using our feature detector in challenging light-constrained scenes. Our feature finder provides a significant step towards robots operating in low-light scenarios and applications including night-time operations.


Superevents: Towards Native Semantic Segmentation for Event-based Cameras

Low, Weng Fei, Sonthalia, Ankit, Gao, Zhi, van Schaik, André, Ramesh, Bharath

arXiv.org Artificial Intelligence

Most successful computer vision models transform low-level features, such as Gabor filter responses, into richer representations of intermediate or mid-level complexity for downstream visual tasks. These mid-level representations have not been explored for event cameras, although it is especially relevant to the visually sparse and often disjoint spatial information in the event stream. By making use of locally consistent intermediate representations, termed as superevents, numerous visual tasks ranging from semantic segmentation, visual tracking, depth estimation shall benefit. In essence, superevents are perceptually consistent local units that delineate parts of an object in a scene. Inspired by recent deep learning architectures, we present a novel method that employs lifetime augmentation for obtaining an event stream representation that is fed to a fully convolutional network to extract superevents. Our qualitative and quantitative experimental results on several sequences of a benchmark dataset highlights the significant potential for event-based downstream applications.


Optimality and limitations of audio-visual integration for cognitive systems

Boyce, W. Paul, Lindsay, Tony, Zgonnikov, Arkady, Rano, Ignacio, Wong-Lin, KongFatt

arXiv.org Artificial Intelligence

Multimodal integration is an important process in perceptual decision-making. In humans, this process has often been shown to be statistically optimal, or near optimal: sensory information is combined in a fashion that minimises the average error in perceptual representation of stimuli. However, sometimes there are costs that come with the optimization, manifesting as illusory percepts. We review audio-visual facilitations and illusions that are products of multisensory integration, and the computational models that account for these phenomena. In particular, the same optimal computational model can lead to illusory percepts, and we suggest that more studies should be needed to detect and mitigate these illusions, as artefacts in artificial cognitive systems. We provide cautionary considerations when designing artificial cognitive systems with the view of avoiding such artefacts. Finally, we suggest avenues of research towards solutions to potential pitfalls in system design. We conclude that detailed understanding of multisensory integration and the mechanisms behind audio-visual illusions can benefit the design of artificial cognitive systems.


Neural Analog Diffusion-Enhancement Layer and Spatio-Temporal Grouping in Early Vision

Waxman, Allen M., Seibert, Michael, Cunningham, Robert K., Wu, Jian

Neural Information Processing Systems

A new class of neural network aimed at early visual processing is described; we call it a Neural Analog Diffusion-Enhancement Layer or "NADEL." The network consists of two levels which are coupled through feedfoward and shunted feedback connections. The lower level is a two-dimensional diffusion map which accepts visual features as input, and spreads activity over larger scales as a function of time. The upper layer is periodically fed the activity from the diffusion layer and locates local maxima in it (an extreme form of contrast enhancement) using a network of local comparators. These local maxima are fed back to the diffusion layer using an on-center/off-surround shunting anatomy. The maxima are also available as output of the network. The network dynamics serves to cluster features on multiple scales as a function of time, and can be used in a variety of early visual processing tasks such as: extraction of comers and high curvature points along edge contours, line end detection, gap filling in contours, generation of fixation points, perceptual grouping on multiple scales, correspondence and path impletion in long-range apparent motion, and building 2-D shape representations that are invariant to location, orientation, scale, and small deformation on the visual field.


Neural Analog Diffusion-Enhancement Layer and Spatio-Temporal Grouping in Early Vision

Waxman, Allen M., Seibert, Michael, Cunningham, Robert K., Wu, Jian

Neural Information Processing Systems

A new class of neural network aimed at early visual processing is described; we call it a Neural Analog Diffusion-Enhancement Layer or "NADEL." The network consists of two levels which are coupled through feedfoward and shunted feedback connections. The lower level is a two-dimensional diffusion map which accepts visual features as input, and spreads activity over larger scales as a function of time. The upper layer is periodically fed the activity from the diffusion layer and locates local maxima in it (an extreme form of contrast enhancement) using a network of local comparators. These local maxima are fed back to the diffusion layer using an on-center/off-surround shunting anatomy. The maxima are also available as output of the network. The network dynamics serves to cluster features on multiple scales as a function of time, and can be used in a variety of early visual processing tasks such as: extraction of comers and high curvature points along edge contours, line end detection, gap filling in contours, generation of fixation points, perceptual grouping on multiple scales, correspondence and path impletion in long-range apparent motion, and building 2-D shape representations that are invariant to location, orientation, scale, and small deformation on the visual field.


Neural Analog Diffusion-Enhancement Layer and Spatio-Temporal Grouping in Early Vision

Waxman, Allen M., Seibert, Michael, Cunningham, Robert K., Wu, Jian

Neural Information Processing Systems

A new class of neural network aimed at early visual processing is described; we call it a Neural Analog Diffusion-Enhancement Layer or "NADEL." The network consists of two levels which are coupled through feedfoward and shunted feedback connections. The lower level is a two-dimensional diffusion map which accepts visual features as input, and spreads activity over larger scales as a function of time. The upper layer is periodically fed the activity from the diffusion layer and locates local maxima in it (an extreme form of contrast enhancement) using a network of local comparators. These local maxima are fed back to the diffusion layer using an on-center/off-surround shunting anatomy. The maxima are also available as output of the network. The network dynamics serves to cluster features on multiple scales as a function of time, and can be used in a variety of early visual processing tasks such as: extraction of comers and high curvature points along edge contours, line end detection, gap filling in contours, generation of fixation points, perceptual grouping on multiple scales, correspondence and path impletion in long-range apparent motion, and building 2-D shape representations that are invariant to location, orientation, scale, and small deformation on the visual field.