AITopics | Image Matching

Collaborating Authors

Image Matching

News Overviews Instructional Materials AI-Alerts Classics

RetinaRegNet: A Versatile Approach for Retinal Image Registration

Sivaraman, Vishal Balaji, Imran, Muhammad, Wei, Qingyue, Muralidharan, Preethika, Tamplin, Michelle R., Grumbach, Isabella M ., Kardon, Randy H., Wang, Jui-Kai, Zhou, Yuyin, Shao, Wei

arXiv.org Artificial IntelligenceMay-20-2024

We introduce the RetinaRegNet model, which can achieve state-of-the-art performance across various retinal image registration tasks. RetinaRegNet does not require training on any retinal images. It begins by establishing point correspondences between two retinal images using image features derived from diffusion models. This process involves the selection of feature points from the moving image using the SIFT algorithm alongside random point sampling. For each selected feature point, a 2D correlation map is computed by assessing the similarity between the feature vector at that point and the feature vectors of all pixels in the fixed image. The pixel with the highest similarity score in the correlation map corresponds to the feature point in the moving image. To remove outliers in the estimated point correspondences, we first applied an inverse consistency constraint, followed by a transformation-based outlier detector. This method proved to outperform the widely used random sample consensus (RANSAC) outlier detector by a significant margin. To handle large deformations, we utilized a two-stage image registration framework. A homography transformation was used in the first stage and a more accurate third-order polynomial transformation was used in the second stage. The model's effectiveness was demonstrated across three retinal image datasets: color fundus images, fluorescein angiography images, and laser speckle flowgraphy images. RetinaRegNet outperformed current state-of-the-art methods in all three datasets. It was especially effective for registering image pairs with large displacement and scaling deformations. This innovation holds promise for various applications in retinal image analysis. Our code is publicly available at https://github.com/mirthAI/RetinaRegNet .

artificial intelligence, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

2404.16017

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Iowa (0.05)
(5 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers

Islam, Saahil, Murthy, Venkatesh N., Neumann, Dominik, Das, Badhan Kumar, Sharma, Puneet, Maier, Andreas, Comaniciu, Dorin, Ghesu, Florin C.

arXiv.org Artificial IntelligenceMay-2-2024

An accurate detection and tracking of devices such as guiding catheters in live X-ray image acquisitions is an essential prerequisite for endovascular cardiac interventions. This information is leveraged for procedural guidance, e.g., directing stent placements. To ensure procedural safety and efficacy, there is a need for high robustness no failures during tracking. To achieve that, one needs to efficiently tackle challenges, such as: device obscuration by contrast agent or other external devices or wires, changes in field-of-view or acquisition angle, as well as the continuous movement due to cardiac and respiratory motion. To overcome the aforementioned challenges, we propose a novel approach to learn spatio-temporal features from a very large data cohort of over 16 million interventional X-ray frames using self-supervision for image sequence data. Our approach is based on a masked image modeling technique that leverages frame interpolation based reconstruction to learn fine inter-frame temporal correspondences. The features encoded in the resulting model are fine-tuned downstream. Our approach achieves state-of-the-art performance and in particular robustness compared to ultra optimized reference solutions (that use multi-stage feature fusion, multi-task and flow regularization). The experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions (23.20% when flow regularization is used); achieving a success score of 97.95% at a 3x faster inference speed of 42 frames-per-second (on GPU). The results encourage the use of our approach in various other tasks within interventional image analytics that require effective understanding of spatio-temporal semantics.

proceedings, sequence, tracking, (11 more...)

arXiv.org Artificial Intelligence

2405.01156

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.64)

Add feedback

Integration of Self-Supervised BYOL in Semi-Supervised Medical Image Recognition

Feng, Hao, Jia, Yuanzhe, Xu, Ruijia, Prasad, Mukesh, Anaissi, Ali, Braytee, Ali

arXiv.org Artificial IntelligenceApr-16-2024

Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts. Addressing the challenges associated with obtaining labeled data has led to the prominence of self-supervised learning and semi-supervised learning, especially in scenarios with limited annotated data. In this paper, we proposed an innovative approach by integrating self-supervised learning into semi-supervised models to enhance medical image recognition. Our methodology commences with pre-training on unlabeled data utilizing the BYOL method. Subsequently, we merge pseudo-labeled and labeled datasets to construct a neural network classifier, refining it through iterative fine-tuning. Experimental results on three different datasets demonstrate that our approach optimally leverages unlabeled data, outperforming existing methods in terms of accuracy for medical image recognition.

classification, learning, semi-supervised learning, (14 more...)

arXiv.org Artificial Intelligence

2404.10405

Country:

Oceania > Australia (0.05)
Asia > China (0.05)
South America > Peru > Lima Department > Lima Province > Lima (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.99)

Add feedback

Adaptive Patching for High-resolution Image Segmentation with Transformers

Zhang, Enzhi, Lyngaas, Isaac, Chen, Peng, Wang, Xiao, Igarashi, Jun, Huo, Yuankai, Wahib, Mohamed, Munetomo, Masaharu

arXiv.org Artificial IntelligenceApr-15-2024

Attention-based models are proliferating in the space of image analytics, including segmentation. The standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation. The solution is to either use custom complex multi-resolution models or approximate attention schemes. We take inspiration from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the images, as a pre-processing step, based on the image details to reduce the number of patches being fed to the model, by orders of magnitude. This method has a negligible overhead, and works seamlessly with any attention-based model, i.e. it is a pre-processing step that can be adopted by any attention-based model without friction. We demonstrate superior segmentation quality over SoTA segmentation models for real-world pathology datasets while gaining a geomean speedup of $6.9\times$ for resolutions up to $64K^2$, on up to $2,048$ GPUs.

patch size, resolution, transformer, (14 more...)

arXiv.org Artificial Intelligence

2404.09707

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Tennessee (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.87)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.34)

Add feedback

A Survey of Neural Network Robustness Assessment in Image Recognition

Wang, Jie, Ai, Jun, Lu, Minyan, Su, Haoran, Yu, Dan, Zhang, Yutao, Zhu, Junda, Liu, Jingyu

arXiv.org Artificial IntelligenceApr-15-2024

In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models. Researchers have dedicated efforts to evaluate robustness in diverse perturbation conditions for image recognition tasks. Robustness assessment encompasses two main techniques: robustness verification/ certification for deliberate adversarial attacks and robustness testing for random data corruptions. In this survey, we present a detailed examination of both adversarial robustness (AR) and corruption robustness (CR) in neural network assessment. Analyzing current research papers and standards, we provide an extensive overview of robustness assessment in image recognition. Three essential aspects are analyzed: concepts, metrics, and assessment methods. We investigate the perturbation metrics and range representations used to measure the degree of perturbations on images, as well as the robustness metrics specifically for the robustness conditions of classification models. The strengths and limitations of the existing methods are also discussed, and some potential directions for future research are provided.

neural network, perturbation, robustness, (15 more...)

arXiv.org Artificial Intelligence

2404.08285

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(6 more...)

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.81)

Add feedback

Increasing SLAM Pose Accuracy by Ground-to-Satellite Image Registration

Zhang, Yanhao, Shi, Yujiao, Wang, Shan, Vora, Ankit, Perincherry, Akhil, Chen, Yongbo, Li, Hongdong

arXiv.org Artificial IntelligenceApr-14-2024

Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and mapping (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the vSLAM with a deep-learning-based ground-to-satellite (G2S) image registration method. In this framework, a coarse (spatial correlation bound check) to fine (visual odometry consistency check) method is designed to select the valid G2S prediction. The selected prediction is then fused with the SLAM measurement by solving a scaled pose graph problem. To further increase the localization accuracy, we provide an iterative trajectory fusion pipeline. The proposed framework is evaluated on two well-known autonomous driving datasets, and the results demonstrate the accuracy and robustness in terms of vehicle localization.

localization, prediction, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2404.09169

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.70)

Industry: Automobiles & Trucks (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.60)

Add feedback

Unsupervised Learning of Spoken Language with Visual Context

Neural Information Processing SystemsMar-12-2024, 13:45:16 GMT

Humans learn to speak before they can read or write, so why can't computers do the same? In this paper, we present a deep neural network model capable of rudimentary spoken language acquisition using untranscribed audio training data, whose only supervision comes in the form of contextually relevant visual images. We describe the collection of our data comprised of over 120,000 spoken audio captions for the Places image dataset and evaluate our model on an image search and annotation task. We also provide some visualizations which suggest that our model is learning to recognize meaningful words within the caption spectrograms.

caption, proceedings, spectrogram, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.35)

Add feedback

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

Cui, Jiequan, Zhu, Beier, Wen, Xin, Qi, Xiaojuan, Yu, Bei, Zhang, Hanwang

arXiv.org Artificial IntelligenceMar-12-2024

In this paper, we present an empirical study on image recognition fairness, i.e., extreme class accuracy disparity on balanced data like ImageNet. We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets, network architectures, and model capacities. Moreover, several intriguing properties of fairness are identified. First, the unfairness lies in problematic representation rather than classifier bias. Second, with the proposed concept of Model Prediction Bias, we investigate the origins of problematic representation during optimization. Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize. It means that more other classes will be confused with harder classes. Then the False Positives (FPs) will dominate the learning in optimization, thus leading to their poor accuracy. Further, we conclude that data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification. The Code is available at https://github.com/dvlab-research/Parametric-Contrastive-Learning.

accuracy, imagenet, unfairness, (11 more...)

arXiv.org Artificial Intelligence

2402.18133

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Transportation (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

General Purpose Image Encoder DINOv2 for Medical Image Registration

Song, Xinrui, Xu, Xuanang, Yan, Pingkun

arXiv.org Artificial IntelligenceFeb-23-2024

Existing medical image registration algorithms rely on either dataset specific training or local texture-based features to align images. The former cannot be reliably implemented without large modality-specific training datasets, while the latter lacks global semantics thus could be easily trapped at local minima. In this paper, we present a training-free deformable image registration method, DINO-Reg, leveraging a general purpose image encoder DINOv2 for image feature extraction. The DINOv2 encoder was trained using the ImageNet data containing natural images. We used the pretrained DINOv2 without any finetuning. Our method feeds the DINOv2 encoded features into a discrete optimizer to find the optimal deformable registration field. We conducted a series of experiments to understand the behavior and role of such a general purpose image encoder in the application of image registration. Combined with handcrafted features, our method won the first place in the recent OncoReg Challenge. To our knowledge, this is the first application of general vision foundation models in medical image registration.

general purpose image encoder dinov2, image registration, registration, (13 more...)

arXiv.org Artificial Intelligence

2402.15687

Country:

South America > Peru > Lima Department > Lima Province > Lima (0.04)
North America > United States > New York > Rensselaer County > Troy (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)

Add feedback

Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways

Wang, Angelina, Bai, Xuechunzi, Barocas, Solon, Blodgett, Su Lin

arXiv.org Artificial IntelligenceFeb-6-2024

As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.

participant, stereotype, stereotype-reinforcing error, (16 more...)

arXiv.org Artificial Intelligence

2402.0442

Country:

North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Alaska (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (0.67)
Government (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.35)

Add feedback