Image Matching
In Pursuit of Causal Label Correlations for Multi-label Image Recognition
Multi-label image recognition aims to predict all objects present in an input image. A common belief is that modeling the correlations between objects is beneficial for multi-label recognition. However, this belief has been recently challenged as label correlations may mislead the classifier in testing, due to the possible contextual bias in training. Accordingly, a few of recent works not only discarded label correlation modeling, but also advocated to remove contextual information for multi-label image recognition. This work explicitly explores label correlations for multi-label image recognition based on a principled causal intervention approach.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Compact and Efficient Neural Networks for Image Recognition Based on Learned 2D Separable Transform
Vashkevich, Maxim, Krivalcevich, Egor
The paper presents a learned two-dimensional separable transform (LST) that can be considered as a new type of computational layer for constructing neural network (NN) architecture for image recognition tasks. The LST based on the idea of sharing the weights of one fullyconnected (FC) layer to process all rows of an image. After that, a second shared FC layer is used to process all columns of image representation obtained from the first layer. The use of LST layers in a NN architecture significantly reduces the number of model parameters compared to models that use stacked FC layers. We show that a NN-classifier based on a single LST layer followed by an FC layer achieves 98.02\% accuracy on the MNIST dataset, while having only 9.5k parameters. We also implemented a LST-based classifier for handwritten digit recognition on the FPGA platform to demonstrate the efficiency of the suggested approach for designing a compact and high-performance implementation of NN models. Git repository with supplementary materials: https://github.com/Mak-Sim/LST-2d
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.62)
Google AI Mode rolls out to more testers with new image search feature
Google is bringing AI Mode to more people in the US. The company announced on Monday it would make the new search tool, first launched at the start of last month, to millions of more Labs users across the country. For uninitiated, AI Mode is a new dedicated tab within Search. It allows you to ask more complicated questions of Google, with a custom version of Gemini 2.0 doing the legwork to deliver a nuanced AI-generated response. Labs, meanwhile, is a beta program you can enroll your Google account in to gain access to new Search features before the company rolls them out to the public.
- Information Technology > Information Management > Search (0.64)
- Information Technology > Sensing and Signal Processing > Image Processing (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)
IMPACT: A Generic Semantic Loss for Multimodal Medical Image Registration
Boussot, Valentin, Hémon, Cédric, Nunes, Jean-Claude, Downling, Jason, Rouzé, Simon, Lafond, Caroline, Barateau, Anaïs, Dillenseger, Jean-Louis
Image registration is fundamental in medical imaging, enabling precise alignment of anatomical structures for diagnosis, treatment planning, image-guided interventions, and longitudinal monitoring. This work introduces IMPACT (Image Metric with Pretrained model-Agnostic Comparison for Transmodality registration), a novel similarity metric designed for robust multimodal image registration. Rather than relying on raw intensities, handcrafted descriptors, or task-specific training, IMPACT defines a semantic similarity measure based on the comparison of deep features extracted from large-scale pretrained segmentation models. By leveraging representations from models such as TotalSegmentator, Segment Anything (SAM), and other foundation networks, IMPACT provides a task-agnostic, training-free solution that generalizes across imaging modalities. These features, originally trained for segmentation, offer strong spatial correspondence and semantic alignment capabilities, making them naturally suited for registration. The method integrates seamlessly into both algorithmic (Elastix) and learning-based (VoxelMorph) frameworks, leveraging the strengths of each. IMPACT was evaluated on five challenging 3D registration tasks involving thoracic CT/CBCT and pelvic MR/CT datasets. Quantitative metrics, including Target Registration Error and Dice Similarity Coefficient, demonstrated consistent improvements in anatomical alignment over baseline methods. Qualitative analyses further highlighted the robustness of the proposed metric in the presence of noise, artifacts, and modality variations. With its versatility, efficiency, and strong performance across diverse tasks, IMPACT offers a powerful solution for advancing multimodal image registration in both clinical and research settings.
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- Oceania > Australia > Queensland (0.04)
- Europe > United Kingdom (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Robust Flower Cluster Matching Using The Unscented Transform
Chu, Andy, Shrestha, Rashik, Gu, Yu, Gross, Jason N.
-- Monitoring flowers over time is essential for precision robotic pollination in agriculture. T o accomplish this, a continuous spatial-temporal observation of plant growth can be done using stationary RGB-D cameras. However, image registration becomes a serious challenge due to changes in the visual appearance of the plant caused by the pollination process and occlusions from growth and camera angles. Plants flower in a manner that produces distinct clusters on branches. This paper presents a method for matching flower clusters using descriptors generated from RGB-D data and considers allowing for spatial uncertainty within the cluster . The proposed approach leverages the Unscented Transform to efficiently estimate plant descriptor uncertainty tolerances, enabling a robust image-registration process despite temporal changes. The Unscented Transform is used to handle the nonlinear transformations by propagating the uncertainty of flower positions to determine the variations in the descriptor domain. A Monte Carlo simulation is used to validate the Unscented Transform results, confirming our method's effectiveness for flower cluster matching. Therefore, it can facilitate improved robotics pollination in dynamic environments. Although global agriculture relies heavily on pollination, evidence has shown that the population of natural pollinators is decreasing, raising concerns about food and the economy [1].
- North America > United States > West Virginia (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.54)
- (2 more...)
Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance
Liu, Hui, Wang, Wenya, Chen, Kecheng, Liu, Jie, Liu, Yibing, Qin, Tiexin, He, Peisong, Jiang, Xinghao, Li, Haoliang
In zero-shot image recognition tasks, humans demonstrate remarkable flexibility in classifying unseen categories by composing known simpler concepts. However, existing vision-language models (VLMs), despite achieving significant progress through large-scale natural language supervision, often underperform in real-world applications because of sub-optimal prompt engineering and the inability to adapt effectively to target classes. To address these issues, we propose a Concept-guided Human-like Bayesian Reasoning (CHBR) framework. Grounded in Bayes' theorem, CHBR models the concept used in human image recognition as latent variables and formulates this task by summing across potential concepts, weighted by a prior distribution and a likelihood function. To tackle the intractable computation over an infinite concept space, we introduce an importance sampling algorithm that iteratively prompts large language models (LLMs) to generate discriminative concepts, emphasizing inter-class differences. We further propose three heuristic approaches involving Average Likelihood, Confidence Likelihood, and Test Time Augmentation (TTA) Likelihood, which dynamically refine the combination of concepts based on the test image. Extensive evaluations across fifteen datasets demonstrate that CHBR consistently outperforms existing state-of-the-art zero-shot generalization methods.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.82)
MORPH-LER: Log-Euclidean Regularization for Population-Aware Image Registration
Karanam, Mokshagna Sai Teja, Iyer, Krithika, Joshi, Sarang, Elhabian, Shireen
Spatial transformations that capture population-level morphological statistics are critical for medical image analysis. Commonly used smoothness regularizers for image registration fail to integrate population statistics, leading to anatomically inconsistent transformations. Inverse consistency regularizers promote geometric consistency but lack population morphometrics integration. Regularizers that constrain deformation to low-dimensional manifold methods address this. However, they prioritize reconstruction over interpretability and neglect diffeomorphic properties, such as group composition and inverse consistency. We introduce MORPH-LER, a Log-Euclidean regularization framework for population-aware unsupervised image registration. MORPH-LER learns population morphometrics from spatial transformations to guide and regularize registration networks, ensuring anatomically plausible deformations. It features a bottleneck autoencoder that computes the principal logarithm of deformation fields via iterative square-root predictions. It creates a linearized latent space that respects diffeomorphic properties and enforces inverse consistency. By integrating a registration network with a diffeomorphic autoencoder, MORPH-LER produces smooth, meaningful deformation fields. The framework offers two main contributions: (1) a data-driven regularization strategy that incorporates population-level anatomical statistics to enhance transformation validity and (2) a linearized latent space that enables compact and interpretable deformation fields for efficient population morphometrics analysis. We validate MORPH-LER across two families of deep learning-based registration networks, demonstrating its ability to produce anatomically accurate, computationally efficient, and statistically meaningful transformations on the OASIS-1 brain imaging dataset.
- North America > United States > Utah (0.05)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (0.90)
- Health & Medicine > Therapeutic Area > Neurology (0.88)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.84)
Reviews: Recurrent Registration Neural Networks for Deformable Image Registration
The main advantage of this approach is its efficiency at inference time with comparable performance of B-spline based approach where an optimization is needed per registration. And it has, according to the authors, much less parameters to optimize. Please confirm if this understanding is correct? 2. What is the reason of making the choice of using multiple steps to gradually transform the moving image to the fixed one? Could the local transformation done in one step instead? For instance, the position network could directly predict K locations to transform in one step instead of prediction one location for K steps.
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
Reviews: Recurrent Registration Neural Networks for Deformable Image Registration
The paper seems to contribute in a significant way in proposing an alternative RNN-based approach for deformable image registration. Although the experimental setting is not extremely strong, the proposed approach seems to give significant computational advantages. Rebuttal clarified most of the reviewers concerns.
- Information Technology > Sensing and Signal Processing > Image Processing (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.82)
Reviews: This Looks Like That: Deep Learning for Interpretable Image Recognition
The prototypical parts network presented in this work is original and potentially very useful learning framework for domains where process-based interpretability is critical. The method is thoroughly evaluated against alternative approaches and performs comparable to other state-of-the-art interpretable learning algorithms. The paper is well written, well motivated, and is accompanied by empirical results to validate the algorithmic contributions. Overall, I would recommend this paper for acceptance. One place for improvement is the discussion of this work in the context of alternative interpretable approaches, specifically the methods that show comparable accuracy.
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)