Goto

Collaborating Authors

 Image Matching


MoonMetaSync: Lunar Image Registration Analysis

Kumar, Ashutosh, Kaushal, Sarthak, Murthy, Shiv Vignesh

arXiv.org Artificial Intelligence

This paper compares scale-invariant (SIFT) and scale-variant (ORB) feature detection methods, alongside our novel feature detector, IntFeat, specifically applied to lunar imagery. We evaluate these methods using low (128x128) and high-resolution (1024x1024) lunar image patches, providing insights into their performance across scales in challenging extraterrestrial environments. IntFeat combines high-level features from SIFT and low-level features from ORB into a single vector space for robust lunar image registration. We introduce SyncVision, a Python package that compares lunar images using various registration methods, including SIFT, ORB, and IntFeat. Our analysis includes upscaling low-resolution lunar images using bi-linear and bi-cubic interpolation, offering a unique perspective on registration effectiveness across scales and feature detectors in lunar landscapes. This research contributes to computer vision and planetary science by comparing feature detection methods for lunar imagery and introducing a versatile tool for lunar image registration and evaluation, with implications for multi-resolution image analysis in space exploration applications.


Recurrent Registration Neural Networks for Deformable Image Registration

Neural Information Processing Systems

Parametric spatial transformation models have been successfully applied to image registration tasks. In such models, the transformation of interest is parameterized by a fixed set of basis functions as for example B-splines. Each basis function is located on a fixed regular grid position among the image domain because the transformation of interest is not known in advance. As a consequence, not all basis functions will necessarily contribute to the final transformation which results in a non-compact representation of the transformation. For each element in the sequence, a local deformation defined by its position, shape, and weight is computed by our recurrent registration neural network.


Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

Neural Information Processing Systems

Vision Transformers (ViT) have achieved remarkable success in large-scale image recognition. They split every 2D image into a fixed number of patches, each of which is treated as a token. Generally, representing an image with more tokens would lead to higher prediction accuracy, while it also results in drastically increased computational cost. To achieve a decent trade-off between accuracy and speed, the number of tokens is empirically set to 16x16 or 14x14. In this paper, we argue that every image has its own characteristics, and ideally the token number should be conditioned on each individual input.


This Looks Like That: Deep Learning for Interpretable Image Recognition

Neural Information Processing Systems

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images.


Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration

Neural Information Processing Systems

This paper concerns the undetermined problem of estimating geometric transformation between image pairs. Recent methods introduce deep neural networks to predict the controlling parameters of hand-crafted geometric transformation models (e.g. However, the low-dimension parametric models are incapable of estimating a highly complex geometric transform with limited flexibility to model the actual geometric deformation from image pairs. To address this issue, we present an end-to-end trainable deep neural networks, named Arbitrary Continuous Geometric Transformation Networks (Arbicon-Net), to directly predict the dense displacement field for pairwise image alignment. Arbicon-Net is generalized from training data to predict the desired arbitrary continuous geometric transformation in a data-driven manner for unseen new pair of images.


Reviews: Bilevel Distance Metric Learning for Robust Image Recognition

Neural Information Processing Systems

Summary: The authors propose a bilevel method for metric learning, where the lower level is responsible for the extraction of discriminative features from the data based on a sparse coding scheme with graph regularization. This effectively detects their underlying geometric structure, and the upper level is a classic metric learning approach that utilizes the learned sparse coefficients. These two components are integrated into a joint optimization problem and an efficient optimization algorithm is developed accordingly. Hence, new data can be classified based on the learned dictionary and the corresponding metric. In the experiments the authors demonstrate the capabilities of the model to provide more discriminative features from high dimensional data, while being more robust to noise.


Reviews: A Simple Cache Model for Image Recognition

Neural Information Processing Systems

This paper presents a cache model to be used in image recognition tasks. The authors argue that class specific information can be retrieved from earlier layers of the network to improve the accuracy of an already trained model, without having to re-train of finetune. This is achieved by extracting and caching the activations of some layers along with the class at training time. At test time a similarity measure is used to calculate how far/close the input is compared to information stored in memory. Experiments show that performance is improved in CIFAR 10/100 and ImageNet.


Microsoft's Photos app is getting a quick image search feature

PCWorld

Microsoft just announced that the latest update for the Photos app in Windows will introduce a new image search feature. As of right now, the update is rolling out to Windows 11 users in the Insider program across all Insider channels. After that's done, it will roll out to Windows 10 users in the Beta and Release Preview channels. And then, of course, it'll be publicly available at some point in the future. Here's how the new image search feature will work: When you open an image in Photos, you'll see a button for the Visual Search with Bing feature at the bottom of the app window.


Look One and More: Distilling Hybrid Order Relational Knowledge for Cross-Resolution Image Recognition

Ge, Shiming, Zhang, Kangkai, Liu, Haolin, Hua, Yingying, Zhao, Shengwei, Jin, Xin, Wen, Hao

arXiv.org Artificial Intelligence

In spite of great success in many image recognition tasks achieved by recent deep models, directly applying them to recognize low-resolution images may suffer from low accuracy due to the missing of informative details during resolution degradation. However, these images are still recognizable for subjects who are familiar with the corresponding high-resolution ones. Inspired by that, we propose a teacher-student learning approach to facilitate low-resolution image recognition via hybrid order relational knowledge distillation. The approach refers to three streams: the teacher stream is pretrained to recognize high-resolution images in high accuracy, the student stream is learned to identify low-resolution images by mimicking the teacher's behaviors, and the extra assistant stream is introduced as bridge to help knowledge transfer across the teacher to the student. To extract sufficient knowledge for reducing the loss in accuracy, the learning of student is supervised with multiple losses, which preserves the similarities in various order relational structures. In this way, the capability of recovering missing details of familiar low-resolution images can be effectively enhanced, leading to a better knowledge transfer. Extensive experiments on metric learning, low-resolution image classification and low-resolution face recognition tasks show the effectiveness of our approach, while taking reduced models.


Deformable Image Registration with Multi-scale Feature Fusion from Shared Encoder, Auxiliary and Pyramid Decoders

Zhou, Hongchao, Hu, Shunbo

arXiv.org Artificial Intelligence

In this work, we propose a novel deformable convolutional pyramid network for unsupervised image registration. Specifically, the proposed network enhances the traditional pyramid network by adding an additional shared auxiliary decoder for image pairs. This decoder provides multi-scale high-level feature information from unblended image pairs for the registration task. During the registration process, we also design a multi-scale feature fusion block to extract the most beneficial features for the registration task from both global and local contexts. Validation results indicate that this method can capture complex deformations while achieving higher registration accuracy and maintaining smooth and plausible deformations.