Image Matching
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Iscen, Ahmet, Fathi, Alireza, Schmid, Cordelia
Retrieval augmented models are becoming increasingly popular for computer vision tasks after their recent success in NLP problems. The goal is to enhance the recognition capabilities of the model by retrieving similar examples for the visual input from an external memory set. In this work, we introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query. We also thoroughly study various ways of constructing the memory dataset. Our experiments show the benefit of using a massive-scale memory dataset of 1B image-text pairs, and demonstrate the performance of different memory representations. We evaluate our method in three different classification tasks, namely long-tailed recognition, learning with noisy labels, and fine-grained classification, and show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
A Hybrid Deep Feature-Based Deformable Image Registration Method for Pathology Images
Zhang, Chulong, Jiang, Yuming, Li, Na, Zhang, Zhicheng, Islam, Md Tauhidul, Dai, Jingjing, Liu, Lin, He, Wenfeng, Qin, Wenjian, Xiong, Jing, Xie, Yaoqin, Liang, Xiaokun
Pathologists need to combine information from differently stained pathology slices for accurate diagnosis. Deformable image registration is a necessary technique for fusing multi-modal pathology slices. This paper proposes a hybrid deep feature-based deformable image registration framework for stained pathology samples. We first extract dense feature points via the detector-based and detector-free deep learning feature networks and perform points matching. Then, to further reduce false matches, an outlier detection method combining the isolation forest statistical model and the local affine correction model is proposed. Finally, the interpolation method generates the deformable vector field for pathology image registration based on the above matching points. We evaluate our method on the dataset of the Non-rigid Histology Image Registration (ANHIR) challenge, which is co-organized with the IEEE ISBI 2019 conference. Our technique outperforms the traditional approaches by 17% with the Average-Average registration target error (rTRE) reaching 0.0034. The proposed method achieved state-of-the-art performance and ranked 1st in evaluating the test dataset. The proposed hybrid deep feature-based registration method can potentially become a reliable method for pathology image registration.
Learning Distance Metrics with Triplet Loss: Advantages and Challenges - AITechTrend
Triplet loss is a loss function that is widely used in machine learning for tasks such as image recognition, facial recognition, and information retrieval. The idea behind triplet loss is to learn a distance metric between objects such that objects that are similar are close together in the metric space, while objects that are dissimilar are far apart. In this article, we will introduce triplet loss, discuss how it works, and explore some of its applications. Triplet loss is a type of loss function used in machine learning that is designed to learn a distance metric between objects. The goal of triplet loss is to embed objects in a metric space such that objects that are similar are close together in the space, while objects that are dissimilar are far apart.
What Affects Learned Equivariance in Deep Image Recognition Models?
Bruintjes, Robert-Jan, Motyka, Tomasz, van Gemert, Jan
Equivariance w.r.t. geometric transformations in neural networks improves data efficiency, parameter efficiency and robustness to out-of-domain perspective shifts. When equivariance is not designed into a neural network, the network can still learn equivariant functions from the data. We quantify this learned equivariance, by proposing an improved measure for equivariance. We find evidence for a correlation between learned translation equivariance and validation accuracy on ImageNet. We therefore investigate what can increase the learned equivariance in neural networks, and find that data augmentation, reduced model capacity and inductive bias in the form of convolutions induce higher learned equivariance in neural networks.
Coarse-to-Fine Image Search Using Neural Networks
The efficiency of image search can be greatly improved by using a coarse-to-fine search strategy with a multi-resolution image representa(cid:173) tion. However, if the resolution is so low that the objects have few dis(cid:173) tinguishing features, search becomes difficult. We show that the performance of search at such low resolutions can be improved by using context information, i.e., objects visible at low-resolution which are not the objects of interest but are associated with them. The networks can be given explicit context information as inputs, or they can learn to detect the context objects, in which case the user does not have to be aware of their existence. We also use Integrated Feature Pyramids, which repre(cid:173) sent high-frequency information at low resolutions.
Image Recognition in Context: Application to Microscopic Urinalysis
There are a number of pattern recognition problem domains where the classification of an object should be based on more than simply the appearance of the object itself. In remote sensing image classification, where each pixel is part of ground cover, a pixel is more like(cid:173) ly to be a glacier if it is in a mountainous area, than if surrounded by pixels of residential areas. In text analysis, one can expect to find certain letters occurring regularly in particu(cid:173) lar arrangement with other letters(qu, ee,est, tion, etc.). The information conveyed by the accompanying entities is referred to as contextual information.
Filtering Abstract Senses From Image Search Results
We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name, and then train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word.
WiMi Hologram Cloud Develops A CNN Algorithm-Based Image Recognition System
WiMi Hologram Cloud Inc. (NASDAQ: WIMI) ("WiMi" or the "Company"), a leading global Hologram Augmented Reality ("AR") Technology provider, today announced that it has developed a CNN (convolutional neural network) algorithm-based image recognition system. CNN is a highly efficient recognition algorithm based on an artificial neural network. WiMi applies the CNN algorithm to image recognition technology, showing apparent advantages compared to the traditional machine learning algorithm. CNN realizes the construction of features by the computer itself, thus breaking through the bottleneck of the original way of classification. This has brought image recognition to a new level.
Building an Image Recognition Model using TensorFlow and Keras Libraries in Python - Code Armada, LLC
Building an Image Recognition Model using TensorFlow and Keras Libraries in Python Image recognition models are extremely useful in a wide range of applications, from autonomous vehicles and medical diagnosis to social media analysis and e-commerce. By teaching a computer to identify and classify images based on certain features, such as color, shape, and texture, we can automate tasks that would be difficult or impossible for humans to do at scale. For example, an image recognition model can be used to detect objects in images, recognize faces and emotions, identify text in images, and even diagnose medical conditions based on medical images. In e-commerce, image recognition models can be used to recommend products based on visual similarity, allowing for more personalized and relevant product recommendations. Pretty cool, right? Let’s give it a try… Step 1. Install the required libraries: First, you need to install TensorFlow and Keras libraries in Python. You can install them using pip command in the terminal. pip install tensorflow pip install keras Step 2. Import the required libraries: Once the libraries are installed, you need to import them in your Python script. import tensorflow as tf from tensorflow import keras Step 3. Load the dataset: Next, […]
BIFRNet: A Brain-Inspired Feature Restoration DNN for Partially Occluded Image Recognition
Zhang, Jiahong, Cao, Lihong, Lai, Qiuxia, Li, Binyao, Qin, Yunxiao
The partially occluded image recognition (POIR) problem has been a challenge for artificial intelligence for a long time. A common strategy to handle the POIR problem is using the non-occluded features for classification. Unfortunately, this strategy will lose effectiveness when the image is severely occluded, since the visible parts can only provide limited information. Several studies in neuroscience reveal that feature restoration which fills in the occluded information and is called amodal completion is essential for human brains to recognize partially occluded images. However, feature restoration is commonly ignored by CNNs, which may be the reason why CNNs are ineffective for the POIR problem. Inspired by this, we propose a novel brain-inspired feature restoration network (BIFRNet) to solve the POIR problem. It mimics a ventral visual pathway to extract image features and a dorsal visual pathway to distinguish occluded and visible image regions. In addition, it also uses a knowledge module to store object prior knowledge and uses a completion module to restore occluded features based on visible features and prior knowledge. Thorough experiments on synthetic and real-world occluded image datasets show that BIFRNet outperforms the existing methods in solving the POIR problem. Especially for severely occluded images, BIRFRNet surpasses other methods by a large margin and is close to the human brain performance. Furthermore, the brain-inspired design makes BIFRNet more interpretable.