Goto

Collaborating Authors

 Image Matching


Recurrent Registration Neural Networks for Deformable Image Registration

arXiv.org Machine Learning

Parametric spatial transformation models have been successfully applied to image registration tasks. In such models, the transformation of interest is parameterized by a fixed set of basis functions as for example B-splines. Each basis function is located on a fixed regular grid position among the image domain, because the transformation of interest is not known in advance. As a consequence, not all basis functions will necessarily contribute to the final transformation which results in a non-compact representation of the transformation. We reformulate the pairwise registration problem as a recursive sequence of successive alignments. For each element in the sequence, a local deformation defined by its position, shape, and weight is computed by our recurrent registration neural network. The sum of all local deformations yield the final spatial alignment of both images. Formulating the registration problem in this way allows the network to detect non-aligned regions in the images and to learn how to locally refine the registration properly. In contrast to current non-sequence-based registration methods, our approach iteratively applies local spatial deformations to the images until the desired registration accuracy is achieved. We trained our network on 2D magnetic resonance images of the lung and compared our method to a standard parametric B-spline registration. The experiments show, that our method performs on par for the accuracy but yields a more compact representation of the transformation. Furthermore, we achieve a speedup of around 15 compared to the B-spline registration.


Using Google Vision AI's Reverse Image Search To Richly Catalog Television News

#artificialintelligence

Deep learning has revolutionized the machine understanding of imagery. Yet today's image recognition models are still limited by the availability of large annotated training datasets upon which to build their libraries of recognized objects and activities. To address this, Google's Vision AI API expands its native catalog of around 10,000 visually recognized objects and activities with the ability to perform the equivalent of a reverse Google Images search across the open Web and tally up the top topics used to caption the given image everywhere it has previously appeared, lending unprecedentedly rich context and understanding, even yielding unique labels for breaking news events. What might this process yield for a week of television news? Google's Vision AI API represents a unique hybrid between traditional deep learning-based image labeling based on a library of previously trained models and the ability to leverage the open Web to annotate images based on the most common topics visually similar images are captioned with. Using its Web Entities feature, the Vision AI API performs what amounts to a reverse Google Images search over the open Web, identifying images across the entire Web that look most similar to the given image.


Building a Chat Bot With Image Recognition and OCR

#artificialintelligence

In part 1 of this series, we gave our bot the ability to detect sentiment from text and respond accordingly. But that's about all it can do, and admittedly quite boring. Of course, in a real chat, we often send a multitude of media: from text, images, videos, gifs, to anything else. So in this, our next step in our journey, let's give our bot vision. The goal of this tutorial is to allow our bot to receive images, reply to them, and eventually give us a crude description of the main object in said image.


Building a Chat Bot With Image Recognition and OCR

#artificialintelligence

In part 1 of this series, we gave our bot the ability to detect sentiment from text and respond accordingly. But that's about all it can do, and admittedly quite boring. Of course, in a real chat, we often send a multitude of media: from text, images, videos, gifs, to anything else. So in this, our next step in our journey, let's give our bot vision. The goal of this tutorial is to allow our bot to receive images, reply to them, and eventually give us a crude description of the main object in said image.


r/deeplearning - Is there any framework recommended to start with text to image recognition?

#artificialintelligence

Hey! I'm new to this community and also I'm working on a project that might have a good scenario to implement text recognition in order to output images for them. Please, if anyone knows where I could start or which tools I could try it would be great!


Quantitative Error Prediction of Medical Image Registration using Regression Forests

arXiv.org Machine Learning

Predicting registration error can be useful for evaluation of registration procedures, which is important for the adoption of registration techniques in the clinic. In addition, quantitative error prediction can be helpful in improving the registration quality. The task of predicting registration error is demanding due to the lack of a ground truth in medical images. This paper proposes a new automatic method to predict the registration error in a quantitative manner, and is applied to chest CT scans. A random regression forest is utilized to predict the registration error locally. The forest is built with features related to the transformation model and features related to the dissimilarity after registration. The forest is trained and tested using manually annotated corresponding points between pairs of chest CT scans in two experiments: SPREAD (trained and tested on SPREAD) and inter-database (including three databases SPREAD, DIR-Lab-4DCT and DIR-Lab-COPDgene). The results show that the mean absolute errors of regression are 1.07 $\pm$ 1.86 and 1.76 $\pm$ 2.59 mm for the SPREAD and inter-database experiment, respectively. The overall accuracy of classification in three classes (correct, poor and wrong registration) is 90.7% and 75.4%, for SPREAD and inter-database respectively. The good performance of the proposed method enables important applications such as automatic quality control in large-scale image analysis.


Retail Automation Image Recognition for Retail

#artificialintelligence

Vue.ai's tags have given us the exact level of granularity we need on our ecommerce platform. Our merchandisers and buyers have a lot of manual work while labelling products. They're also fixing data inconsistencies we receive from our external vendors, an expensive and time consuming process. With an automated catalog management tool like VueTag, we efficiently tag products, provide better product discovery for our shoppers, and speeden our go-to-market strategy


Exploring Representativeness and Informativeness for Active Learning

arXiv.org Machine Learning

How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified Best-versus-Second-Best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.


Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibration

arXiv.org Artificial Intelligence

Zero-shot learning (ZSL) for image classification focuses on recognizing novel categories that have no labeled data available for training. The learning is generally carried out with the help of mid-level semantic descriptors associated with each class. This semantic-descriptor space is generally shared by both seen and unseen categories. However, ZSL suffers from hubness, domain discrepancy and biased-ness towards seen classes. To tackle these problems, we propose a three-step approach to zero-shot learning. Firstly, a mapping is learned from the semantic-descriptor space to the image-feature space. This mapping learns to minimize both one-to-one and pairwise distances between semantic embeddings and the image features of the corresponding classes. Secondly, we propose test-time domain adaptation to adapt the semantic embedding of the unseen classes to the test data. This is achieved by finding correspondences between the semantic descriptors and the image features. Thirdly, we propose scaled calibration on the classification scores of the seen classes. This is necessary because the ZSL model is biased towards seen classes as the unseen classes are not used in the training. Finally, to validate the proposed three-step approach, we performed experiments on four benchmark datasets where the proposed method outperformed previous results. We also studied and analyzed the performance of each component of our proposed ZSL framework.


The best image-recognition AIs are fooled by slightly rotated images

New Scientist

TELLING a yellow taxi and a pair of binoculars apart is so easy most people could do it standing on their head. Not so for an artificial intelligence: flip the cab upside down and it sees binoculars. This is just one of dozens of examples that show AI is a lot worse at identifying objects by sight than many people realise.