Goto

Collaborating Authors

 visual field



Peripheral Vision Transformer

Neural Information Processing Systems

Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions. In this work, we take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on ImageNet-1K and systematically investigate the inner workings of the model for machine perception, showing that the network learns to perceive visual data similarly to the way that human vision does. The performance improvements in image classification over the baselines across different model sizes demonstrate the efficacy of the proposed method.




Event-based vision for egomotion estimation using precise event timing

Greatorex, Hugh, Mastella, Michele, Cotteret, Madison, Richter, Ole, Chicca, Elisabetta

arXiv.org Artificial Intelligence

--Egomotion estimation is crucial for applications such as autonomous navigation and robotics, where accurate and real-time motion tracking is required. However, traditional methods relying on inertial sensors are highly sensitive to external conditions, and suffer from drifts leading to large inaccuracies over long distances. Vision-based methods, particularly those util-ising event-based vision sensors, provide an efficient alternative by capturing data only when changes are perceived in the scene. In this work, we propose a fully event-based pipeline for egomotion estimation that processes the event stream directly within the event-based domain. This method eliminates the need for frame-based intermediaries, allowing for low-latency and energy-efficient motion estimation. We construct a shallow spiking neural network using a synaptic gating mechanism to convert precise event timing into bursts of spikes. These spikes encode local optical flow velocities, and the network provides an event-based readout of egomotion. We evaluate the network's performance on a dedicated chip, demonstrating strong potential for low-latency, low-power motion estimation. Additionally, simulations of larger networks show that the system achieves state-of-the-art accuracy in egomotion estimation tasks with event-based cameras, making it a promising solution for real-time, power-constrained robotics applications. The estimation of egomotion plays an important role in applications such as autonomous navigation, robotics and Augmented Reality (AR).


Peripheral Vision Transformer

Neural Information Processing Systems

Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions. In this work, we take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on ImageNet-1K and systematically investigate the inner workings of the model for machine perception, showing that the network learns to perceive visual data similarly to the way that human vision does.


RLPeri: Accelerating Visual Perimetry Test with Reinforcement Learning and Convolutional Feature Extraction

Verma, Tanvi, Dinh, Linh Le, Tan, Nicholas, Xu, Xinxing, Cheng, Chingyu, Liu, Yong

arXiv.org Artificial Intelligence

Visual perimetry is an important eye examination that helps detect vision problems caused by ocular or neurological conditions. During the test, a patient's gaze is fixed at a specific location while light stimuli of varying intensities are presented in central and peripheral vision. Based on the patient's responses to the stimuli, the visual field mapping and sensitivity are determined. However, maintaining high levels of concentration throughout the test can be challenging for patients, leading to increased examination times and decreased accuracy. In this work, we present RLPeri, a reinforcement learning-based approach to optimize visual perimetry testing. By determining the optimal sequence of locations and initial stimulus values, we aim to reduce the examination time without compromising accuracy. Additionally, we incorporate reward shaping techniques to further improve the testing performance. To monitor the patient's responses over time during testing, we represent the test's state as a pair of 3D matrices. We apply two different convolutional kernels to extract spatial features across locations as well as features across different stimulus values for each location. Through experiments, we demonstrate that our approach results in a 10-20% reduction in examination time while maintaining the accuracy as compared to state-of-the-art methods. With the presented approach, we aim to make visual perimetry testing more efficient and patient-friendly, while still providing accurate results.


MURPHY: A Robot that Learns by Doing

Neural Information Processing Systems

MURPHY consists of a camera looking at a robot arm, with a connectionist network architecture situated in between. By moving its arm through a small, representative sample of the 1 billion possible joint configurations, MURPHY learns the relationships, backwards and forwards, between the positions of its joints and the state of its visual field. MURPHY can use its internal model in the forward direction to "envision" sequences of actions for planning purposes, such as in grabbing a visually presented object, or in the reverse direction to "imitate", with its arm, autonomous activity in its visual field. Furthermore, by taking explicit advantage of continuity in the mappings between visual space and joint space, MURPHY is able to learn non-linear mappings with only a single layer of modifiable weights.


Learning to See Where and What: Training a Net to Make Saccades and Recognize Handwritten Characters

Neural Information Processing Systems

The approach, called Saccade, integrates ballistic and corrective saccades (eye movements) with character recognition. A single backpropagation net is trained to make a classification decision on a character centered in its input window, as well as to estimate the distance of the current and next character from the center of the input window. The net learns to accurately estimate these distances regardless of variations in character width, spacing between characters, writing style and other factors. During testing, the system uses the net xtracted classification and distance information, along with a set of jumping rules, to jump from character to character. The ability to read rests on multiple foundation skills.


Perceiving Complex Visual Scenes: An Oscillator Neural Network Model that Integrates Selective Attention, Perceptual Organisation, and Invariant Recognition

Neural Information Processing Systems

Which processes underly our ability to quickly recognize familiar objects within a complex visual input scene? In this paper an imple(cid:173) mented neural network model is described that attempts to specify how selective visual attention, perceptual organisation, and invari(cid:173) ance transformations might work together in order to segment, select, and recognize objects out of complex input scenes containing multi(cid:173) ple, possibly overlapping objects. Retinotopically organized feature maps serve as input for two main processing routes: pathway' dealing with location information and the'what-pathway' computing the shape and attributes of objects. A location-based at(cid:173) tention mechanism operates on an early stage of visual processing selecting a contigous region of the visual field for preferential proces(cid:173) sing. Additionally, location-based attention plays an important role for invariant object recognition controling appropriate normalization processes within the what-pathway.