Goto

Collaborating Authors

 object recognition


GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition

Neural Information Processing Systems

Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, and no personally identifiable information, collected by soliciting images from people across the world. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, allowing us to highlight shortcomings in current models, as well as demonstrate improved performance even when training on this small dataset.


Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition

Neural Information Processing Systems

Inspired by predictive coding - a theory in neuroscience, we develop a bi-directional and dynamic neural network with local recurrent processing, namely predictive coding network (PCN). Unlike feedforward-only convolutional neural networks, PCN includes both feedback connections, which carry top-down predictions, and feedforward connections, which carry bottom-up errors of prediction. Feedback and feedforward connections enable adjacent layers to interact locally and recurrently to refine representations towards minimization of layer-wise prediction errors. When unfolded over time, the recurrent processing gives rise to an increasingly deeper hierarchy of non-linear transformation, allowing a shallow network to dynamically extend itself into an arbitrarily deep network.


AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly

Kyaw, Alexander Htet, Ma, Haotian, Zivkovic, Sasa, Sabin, Jenny

arXiv.org Artificial Intelligence

We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition to identify different assembly components and display step-by-step instructions. For each assembly step, the system displays a bounding box around the corresponding components in the physical space, and where the component should be placed. By connecting assembly instructions with the real-time location of relevant components, the system eliminates the need for manual searching, sorting, or labeling of different components before each assembly. To demonstrate the feasibility of using object recognition for AR-assisted assembly, we highlight a case study involving the assembly of LEGO sculptures.


GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition

Neural Information Processing Systems

Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, and no personally identifiable information, collected by soliciting images from people across the world. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, allowing us to highlight shortcomings in current models, as well as demonstrate improved performance even when training on this small dataset.


Reviews: Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition

Neural Information Processing Systems

This paper presents the predictive coding network (PCN), a convolutional architecture with local recurrent and feedback connections. Higher layers provide top-down predictions while the lower layers provide the prediction errors, which are refined over time by the local recurrence. This idea is not new, other work (such as that of Lotter et al. and others) have used this for other tasks, such as video prediction and object recognition, though this has yet to be shown to scale to larger scale tasks such as ImageNet. The authors compare the performance of PCN, with varying number of cycles of recurrent processing, to standard CNN architectures on multiple image datasets. In general, PCN has slightly lower error than standard architectures with a comparable number of parameters.


Why The Brain Separates Face Recognition From Object Recognition

Neural Information Processing Systems

Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes. Recent electrophysiology studies of cells in several of these specialized regions revealed that at least some of these regions are organized in a hierarchical manner with viewpointspecific cells projecting to downstream viewpoint-invariant identity-specific cells [1]. A separate computational line of reasoning leads to the claim that some transformations of visual inputs that preserve viewed object identity are class-specific. In particular, the 2D images evoked by a face undergoing a 3D rotation are not produced by the same image transformation (2D) that would produce the images evoked by an object of another class undergoing the same 3D rotation. However, within the class of faces, knowledge of the image transformation evoked by 3D rotation can be reliably transferred from previously viewed faces to help identify a novel face at a new viewpoint. We show, through computational simulations, that an architecture which applies this method of gaining invariance to class-specific transformations is effective when restricted to faces and fails spectacularly when applied to other object classes. We argue here that in order to accomplish viewpoint-invariant face identification from a single example view, visual cortex must separate the circuitry involved in discounting 3D rotations of faces from the generic circuitry involved in processing other objects. The resulting model of the ventral stream of visual cortex is consistent with the recent physiology results showing the hierarchical organization of the face processing network.


Compositionality, MDL Priors, and Object Recognition

Neural Information Processing Systems

Images are ambiguous at each of many levels of a contextual hi(cid:173) erarchy. Nevertheless, the high-level interpretation of most scenes is unambiguous, as evidenced by the superior performance of hu(cid:173) mans. This observation argues for global vision models, such as de(cid:173) formable templates. Unfortunately, such models are computation(cid:173) ally intractable for unconstrained problems. We propose a composi(cid:173) tional model in which primitives are recursively composed, subject to syntactic restrictions, to form tree-structured objects and object groupings.


A Framework For Refining Text Classification and Object Recognition from Academic Articles

Li, Jinghong, Ota, Koichi, Gu, Wen, Hasegawa, Shinobu

arXiv.org Artificial Intelligence

With the widespread use of the internet, it has become increasingly crucial to extract specific information from vast amounts of academic articles efficiently. Data mining techniques are generally employed to solve this issue. However, data mining for academic articles is challenging since it requires automatically extracting specific patterns in complex and unstructured layout documents. Current data mining methods for academic articles employ rule-based(RB) or machine learning(ML) approaches. However, using rule-based methods incurs a high coding cost for complex typesetting articles. On the other hand, simply using machine learning methods requires annotation work for complex content types within the paper, which can be costly. Furthermore, only using machine learning can lead to cases where patterns easily recognized by rule-based methods are mistakenly extracted. To overcome these issues, from the perspective of analyzing the standard layout and typesetting used in the specified publication, we emphasize implementing specific methods for specific characteristics in academic articles. We have developed a novel Text Block Refinement Framework (TBRF), a machine learning and rule-based scheme hybrid. We used the well-known ACL proceeding articles as experimental data for the validation experiment. The experiment shows that our approach achieved over 95% classification accuracy and 90% detection accuracy for tables and figures.


CNN-based Methods for Object Recognition with High-Resolution Tactile Sensors

Gandarias, Juan M., García-Cerezo, Alfonso J., Gómez-de-Gabriel, Jesús M.

arXiv.org Artificial Intelligence

Novel high-resolution pressure-sensor arrays allow treating pressure readings as standard images. Computer vision algorithms and methods such as Convolutional Neural Networks (CNN) can be used to identify contact objects. In this paper, a high-resolution tactile sensor has been attached to a robotic end-effector to identify contacted objects. Two CNN-based approaches have been employed to classify pressure images. These methods include a transfer learning approach using a pre-trained CNN on an RGB-images dataset and a custom-made CNN (TactNet) trained from scratch with tactile information. The transfer learning approach can be carried out by retraining the classification layers of the network or replacing these layers with an SVM. Overall, 11 configurations based on these methods have been tested: 8 transfer learning-based, and 3 TactNet-based. Moreover, a study of the performance of the methods and a comparative discussion with the current state-of-the-art on tactile object recognition is presented.


What is Computer Vision and its Benefits - Rishabh Software

#artificialintelligence

Image Acquisition: The first step in computer vision is to acquire an image or video feed. This can be done using a camera or other imaging device. Pre-Processing: Once the image is acquired, it needs to be pre-processed to make it easier for the computer to analyze. This may involve noise reduction, image enhancement, or color correction. Feature Extraction: In this step, the computer analyzes the image to identify and extract specific features relevant to the task.