object recognition
GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition
Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, and no personally identifiable information, collected by soliciting images from people across the world. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, allowing us to highlight shortcomings in current models, as well as demonstrate improved performance even when training on this small dataset.
- North America (0.28)
- Europe (0.28)
Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition
Inspired by predictive coding - a theory in neuroscience, we develop a bi-directional and dynamic neural network with local recurrent processing, namely predictive coding network (PCN). Unlike feedforward-only convolutional neural networks, PCN includes both feedback connections, which carry top-down predictions, and feedforward connections, which carry bottom-up errors of prediction. Feedback and feedforward connections enable adjacent layers to interact locally and recurrently to refine representations towards minimization of layer-wise prediction errors. When unfolded over time, the recurrent processing gives rise to an increasingly deeper hierarchy of non-linear transformation, allowing a shallow network to dynamically extend itself into an arbitrarily deep network.
AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly
Kyaw, Alexander Htet, Ma, Haotian, Zivkovic, Sasa, Sabin, Jenny
We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition to identify different assembly components and display step-by-step instructions. For each assembly step, the system displays a bounding box around the corresponding components in the physical space, and where the component should be placed. By connecting assembly instructions with the real-time location of relevant components, the system eliminates the need for manual searching, sorting, or labeling of different components before each assembly. To demonstrate the feasibility of using object recognition for AR-assisted assembly, we highlight a case study involving the assembly of LEGO sculptures.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.18)
- North America > United States > New York > Tompkins County > Ithaca (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (2 more...)
GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition
Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, and no personally identifiable information, collected by soliciting images from people across the world. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, allowing us to highlight shortcomings in current models, as well as demonstrate improved performance even when training on this small dataset.
- North America (0.29)
- Europe (0.29)
Reviews: Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition
This paper presents the predictive coding network (PCN), a convolutional architecture with local recurrent and feedback connections. Higher layers provide top-down predictions while the lower layers provide the prediction errors, which are refined over time by the local recurrence. This idea is not new, other work (such as that of Lotter et al. and others) have used this for other tasks, such as video prediction and object recognition, though this has yet to be shown to scale to larger scale tasks such as ImageNet. The authors compare the performance of PCN, with varying number of cycles of recurrent processing, to standard CNN architectures on multiple image datasets. In general, PCN has slightly lower error than standard architectures with a comparable number of parameters.
Why The Brain Separates Face Recognition From Object Recognition
Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes. Recent electrophysiology studies of cells in several of these specialized regions revealed that at least some of these regions are organized in a hierarchical manner with viewpointspecific cells projecting to downstream viewpoint-invariant identity-specific cells [1]. A separate computational line of reasoning leads to the claim that some transformations of visual inputs that preserve viewed object identity are class-specific. In particular, the 2D images evoked by a face undergoing a 3D rotation are not produced by the same image transformation (2D) that would produce the images evoked by an object of another class undergoing the same 3D rotation. However, within the class of faces, knowledge of the image transformation evoked by 3D rotation can be reliably transferred from previously viewed faces to help identify a novel face at a new viewpoint. We show, through computational simulations, that an architecture which applies this method of gaining invariance to class-specific transformations is effective when restricted to faces and fails spectacularly when applied to other object classes. We argue here that in order to accomplish viewpoint-invariant face identification from a single example view, visual cortex must separate the circuitry involved in discounting 3D rotations of faces from the generic circuitry involved in processing other objects. The resulting model of the ventral stream of visual cortex is consistent with the recent physiology results showing the hierarchical organization of the face processing network.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Compositionality, MDL Priors, and Object Recognition
Images are ambiguous at each of many levels of a contextual hi(cid:173) erarchy. Nevertheless, the high-level interpretation of most scenes is unambiguous, as evidenced by the superior performance of hu(cid:173) mans. This observation argues for global vision models, such as de(cid:173) formable templates. Unfortunately, such models are computation(cid:173) ally intractable for unconstrained problems. We propose a composi(cid:173) tional model in which primitives are recursively composed, subject to syntactic restrictions, to form tree-structured objects and object groupings.
- Information Technology > Artificial Intelligence > Vision (0.85)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
A Framework For Refining Text Classification and Object Recognition from Academic Articles
Li, Jinghong, Ota, Koichi, Gu, Wen, Hasegawa, Shinobu
With the widespread use of the internet, it has become increasingly crucial to extract specific information from vast amounts of academic articles efficiently. Data mining techniques are generally employed to solve this issue. However, data mining for academic articles is challenging since it requires automatically extracting specific patterns in complex and unstructured layout documents. Current data mining methods for academic articles employ rule-based(RB) or machine learning(ML) approaches. However, using rule-based methods incurs a high coding cost for complex typesetting articles. On the other hand, simply using machine learning methods requires annotation work for complex content types within the paper, which can be costly. Furthermore, only using machine learning can lead to cases where patterns easily recognized by rule-based methods are mistakenly extracted. To overcome these issues, from the perspective of analyzing the standard layout and typesetting used in the specified publication, we emphasize implementing specific methods for specific characteristics in academic articles. We have developed a novel Text Block Refinement Framework (TBRF), a machine learning and rule-based scheme hybrid. We used the well-known ACL proceeding articles as experimental data for the validation experiment. The experiment shows that our approach achieved over 95% classification accuracy and 90% detection accuracy for tables and figures.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.48)
CNN-based Methods for Object Recognition with High-Resolution Tactile Sensors
Gandarias, Juan M., García-Cerezo, Alfonso J., Gómez-de-Gabriel, Jesús M.
Novel high-resolution pressure-sensor arrays allow treating pressure readings as standard images. Computer vision algorithms and methods such as Convolutional Neural Networks (CNN) can be used to identify contact objects. In this paper, a high-resolution tactile sensor has been attached to a robotic end-effector to identify contacted objects. Two CNN-based approaches have been employed to classify pressure images. These methods include a transfer learning approach using a pre-trained CNN on an RGB-images dataset and a custom-made CNN (TactNet) trained from scratch with tactile information. The transfer learning approach can be carried out by retraining the classification layers of the network or replacing these layers with an SVM. Overall, 11 configurations based on these methods have been tested: 8 transfer learning-based, and 3 TactNet-based. Moreover, a study of the performance of the methods and a comparative discussion with the current state-of-the-art on tactile object recognition is presented.
- North America > United States > Massachusetts > Suffolk County > South Boston (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
What is Computer Vision and its Benefits - Rishabh Software
Image Acquisition: The first step in computer vision is to acquire an image or video feed. This can be done using a camera or other imaging device. Pre-Processing: Once the image is acquired, it needs to be pre-processed to make it easier for the computer to analyze. This may involve noise reduction, image enhancement, or color correction. Feature Extraction: In this step, the computer analyzes the image to identify and extract specific features relevant to the task.