Image Matching
Enhancing Medical Image Registration via Appearance Adjustment Networks
Meng, Mingyuan, Bi, Lei, Fulham, Michael, Feng, David Dagan, Kim, Jinman
Deformable image registration is fundamental for many medical image analyses. A key obstacle for accurate image registration is the variations in image appearance. Recently, deep learning-based registration methods (DLRs), using deep neural networks, have computational efficiency that is several orders of magnitude greater than traditional optimization-based registration methods (ORs). A major drawback, however, of DLRs is a disregard for the target-pair-specific optimization that is inherent in ORs and instead they rely on a globally optimized network that is trained with a set of training samples to achieve faster registration. Thus, DLRs inherently have degraded ability to adapt to appearance variations and perform poorly, compared to ORs, when image pairs (fixed/moving images) have large differences in appearance. Hence, we propose an Appearance Adjustment Network (AAN) where we leverage anatomy edges, through an anatomy-constrained loss function, to generate an anatomy-preserving appearance transformation. We designed the AAN so that it can be readily inserted into a wide range of DLRs, to reduce the appearance differences between the fixed and moving images. Our AAN and DLR's network can be trained cooperatively in an unsupervised and end-to-end manner. We evaluated our AAN with two widely used DLRs - Voxelmorph (VM) and FAst IMage registration (FAIM) - on three public 3D brain magnetic resonance (MR) image datasets - IBSR18, Mindboggle101, and LPBA40. The results show that DLRs, using the AAN, improved performance and achieved higher results than state-of-the-art ORs.
Touchless Palmprint Recognition based on 3D Gabor Template and Block Feature Refinement
Li, Zhaoqun, Liang, Xu, Fan, Dandan, Li, Jinxing, Jia, Wei, Zhang, David
With the growing demand for hand hygiene and convenience of use, palmprint recognition with touchless manner made a great development recently, providing an effective solution for person identification. Despite many efforts that have been devoted to this area, it is still uncertain about the discriminative ability of the contactless palmprint, especially for large-scale datasets. To tackle the problem, in this paper, we build a large-scale touchless palmprint dataset containing 2334 palms from 1167 individuals. To our best knowledge, it is the largest contactless palmprint image benchmark ever collected with regard to the number of individuals and palms. Besides, we propose a novel deep learning framework for touchless palmprint recognition named 3DCPN (3D Convolution Palmprint recognition Network) which leverages 3D convolution to dynamically integrate multiple Gabor features. In 3DCPN, a novel variant of Gabor filter is embedded into the first layer for enhancement of curve feature extraction. With a well-designed ensemble scheme,low-level 3D features are then convolved to extract high-level features. Finally on the top, we set a region-based loss function to strengthen the discriminative ability of both global and local descriptors. To demonstrate the superiority of our method, extensive experiments are conducted on our dataset and other popular databases TongJi and IITD, where the results show the proposed 3DCPN achieves state-of-the-art or comparable performances.
An Enriched Automated PV Registry: Combining Image Recognition and 3D Building Data
Rausch, Benjamin, Mayer, Kevin, Arlt, Marie-Louise, Gust, Gunther, Staudt, Philipp, Weinhardt, Christof, Neumann, Dirk, Rajagopal, Ram
While photovoltaic (PV) systems are installed at an unprecedented rate, reliable information on an installation level remains scarce. As a result, automatically created PV registries are a timely contribution to optimize grid planning and operations. This paper demonstrates how aerial imagery and three-dimensional building data can be combined to create an address-level PV registry, specifying area, tilt, and orientation angles. We demonstrate the benefits of this approach for PV capacity estimation. In addition, this work presents, for the first time, a comparison between automated and officially-created PV registries. Our results indicate that our enriched automated registry proves to be useful to validate, update, and complement official registries.
On image recognition software, AI, and patents - Innovation Origins
I find them incredibly irritating. Those images you have to click on to prove that you are not a robot. If you are just one click away from a nice weekend away, you first have to figure out where you can see the traffic lights on 16 tiny fuzzy squares. Google makes grateful use of these puzzling attempts. For one thing, the company uses artificial intelligence to train its image recognition software.
A basic design pattern for image recognition
Prior to 2017, most renditions of neural network models were coded in a batch scripting style. As AI researchers and experienced software engineers became increasingly involved in research and design, we started to see a shift in the coding of models that reflected software engineering principles for reuse and design patterns. A design pattern implies that there is a "best practice" for constructing and coding a model that can be reapplied across a wide range of cases, such as image classification, object detection and tracking, facial recognition, image segmentation, super resolution and style transfer. The introduction of design patterns also helped advance convolutional neural networks (as well as other network architectures) by aiding other researchers in understanding and reproducing a model's architecture. A procedural style for reuse was one of the earliest versions of using design patterns for neural network models.
This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition
Nauta, Meike, Jutte, Annemarie, Provoost, Jesper, Seifert, Christin
Image recognition with prototypes is considered an interpretable alternative for black box deep learning models. Classification depends on the extent to which a test image "looks like" a prototype. However, perceptual similarity for humans can be different from the similarity learnt by the model. A user is unaware of the underlying classification strategy and does not know which image characteristics (e.g., color or shape) is the dominant characteristic for the decision. We address this ambiguity and argue that prototypes should be explained. Only visualizing prototypes can be insufficient for understanding what a prototype exactly represents, and why a prototype and an image are considered similar. We improve interpretability by automatically enhancing prototypes with extra information about visual characteristics considered important by the model. Specifically, our method quantifies the influence of color hue, shape, texture, contrast and saturation in a prototype. We apply our method to the existing Prototypical Part Network (ProtoPNet) and show that our explanations clarify the meaning of a prototype which might have been interpreted incorrectly otherwise. We also reveal that visually similar prototypes can have the same explanations, indicating redundancy. Because of the generality of our approach, it can improve the interpretability of any similarity-based method for prototypical image recognition.
ICDAR 2021 Competition: Detecting Tables Using Image Recognition
Table recognition is a well-studied problem in document analysis, and many academic and commercial approaches have been developed to recognize tables in several document formats, including plain text, scanned page images, and born-digital, object-based formats such as PDF. There are several works that can convert tables in text-based PDF format into structured representations. However, there is limited work on image-based table content recognition. The proposed challenge aims at assessing the ability of state-of-the-art methods to recognize scientific tables in LaTeX format. Our shared task has two subtasks.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, Alexey, Beyer, Lucas, Kolesnikov, Alexander, Weissenborn, Dirk, Zhai, Xiaohua, Unterthiner, Thomas, Dehghani, Mostafa, Minderer, Matthias, Heigold, Georg, Gelly, Sylvain, Uszkoreit, Jakob, Houlsby, Neil
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. Self-attention-based architectures, in particular Transformers (Vaswani et al., 2017), have become the model of choice in natural language processing (NLP). The dominant approach is to pre-train on a large text corpus and then fine-tune on a smaller task-specific dataset (Devlin et al., 2019). Thanks to Transformers' computational efficiency and scalability, it has become possible to train models of unprecedented size, with over 100B parameters. With the models and datasets growing, there is still no sign of saturating performance. In computer vision, however, convolutional architectures remain dominant (LeCun et al., 1989; Krizhevsky et al., 2012; He et al., 2016). Inspired by NLP successes, multiple works try combining CNN-like architectures with self-attention (Wang et al., 2018; Carion et al., 2020), some replacing the convolutions entirely (Ramachandran et al., 2019; Wang et al., 2020a). The latter models, while theoretically efficient, have not yet been scaled effectively on modern hardware accelerators due to the use of specialized attention patterns. Therefore, in large-scale image recognition, classic ResNetlike architectures are still state of the art (Mahajan et al., 2018; Xie et al., 2020; Kolesnikov et al., 2020). Inspired by the Transformer scaling successes in NLP, we experiment with applying a standard Transformer directly to images, with the fewest possible modifications. To do so, we split an image into patches and provide the sequence of linear embeddings of these patches as an input to a Transformer.
AI that scans a construction site can spot when things are falling behind
The system uses a GoPro camera mounted on top of a hard hat. When managers tour a site once or twice a week, the camera on their head captures video footage of the whole project and uploads it to image recognition software, which compares the status of many thousands of objects on site--such as electrical sockets and bathroom fittings--with a digital replica of the building. The AI also uses the video feed to track where the camera is in the building to within a few centimeters so that it can identify the exact location of the objects in each frame. The system can track the status of around 150,000 objects several times a week, says Danon. For each object the AI can tell which of three or four states it is in, from not yet begun to fully installed.
A "Hello World" Into Image Recognition with MNIST
To begin, we'll load the library Keras and other necessary inputs: Next, we'll load the MNIST dataset and split it into X train, X test, Y train, and Y test data: Next, we can outline some important variables for image loading and training. The data needs to be converted to a 32-bit float and standardized. Now that the data is ready, we can define the model architecture. After the model architecture has been defined, it must be compiled. Compiling a model outlines the loss function, optimizer, and metrics.