template image
StainPIDR: A Pathological Image Decouplingand Reconstruction Method for Stain Normalization Based on Color Vector Quantization and Structure Restaining
The color appearance of a pathological image is highly related to the imaging protocols, the proportion of different dyes, and the scanning devices. Computer-aided diagnostic systems may deteriorate when facing these color-variant pathological images. In this work, we propose a stain normalization method called StainPIDR. We try to eliminate this color discrepancy by decoupling the image into structure features and vector-quantized color features, restaining the structure features with the target color features, and decoding the stained structure features to normalized pathological images. We assume that color features decoupled by different images with the same color should be exactly the same. Under this assumption, we train a fixed color vector codebook to which the decoupled color features will map. In the restaining part, we utilize the cross-attention mechanism to efficiently stain the structure features. As the target color (decoupled from a selected template image) will also affect the performance of stain normalization, we further design a template image selection algorithm to select a template from a given dataset. In our extensive experiments, we validate the effectiveness of StainPIDR and the template image selection algorithm. All the results show that our method can perform well in the stain normalization task. The code of StainPIDR will be publicly available later.
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Cherif, Lynn, Kondrup, Flemming, Venuto, David, Anand, Ankit, Precup, Doina, Khetarpal, Khimya
Agents that can autonomously navigate the web through a graphical user interface (GUI) using a unified action space (e.g., mouse and keyboard actions) can require very large amounts of domain-specific expert demonstrations to achieve good performance. Low sample efficiency is often exacerbated in sparse-reward and large-action-space environments, such as a web GUI, where only a few actions are relevant in any given situation. In this work, we consider the low-data regime, with limited or no access to expert behavior. To enable sample-efficient learning, we explore the effect of constraining the action space through $\textit{intent-based affordances}$ -- i.e., considering in any situation only the subset of actions that achieve a desired outcome. We propose $\textbf{Code as Generative Affordances}$ $(\textbf{$\texttt{CoGA}$})$, a method that leverages pre-trained vision-language models (VLMs) to generate code that determines affordable actions through implicit intent-completion functions and using a fully-automated program generation and verification pipeline. These programs are then used in-the-loop of a reinforcement learning agent to return a set of affordances given a pixel observation. By greatly reducing the number of actions that an agent must consider, we demonstrate on a wide range of tasks in the MiniWob++ benchmark that: $\textbf{1)}$ $\texttt{CoGA}$ is orders of magnitude more sample efficient than its RL agent, $\textbf{2)}$ $\texttt{CoGA}$'s programs can generalize within a family of tasks, and $\textbf{3)}$ $\texttt{CoGA}$ performs better or on par compared with behavior cloning when a small number of expert demonstrations is available.
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Italy > Sardinia (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
FM-OSD: Foundation Model-Enabled One-Shot Detection of Anatomical Landmarks
Miao, Juzheng, Chen, Cheng, Zhang, Keli, Chuai, Jie, Li, Quanzheng, Heng, Pheng-Ann
One-shot detection of anatomical landmarks is gaining significant attention for its efficiency in using minimal labeled data to produce promising results. However, the success of current methods heavily relies on the employment of extensive unlabeled data to pre-train an effective feature extractor, which limits their applicability in scenarios where a substantial amount of unlabeled data is unavailable. In this paper, we propose the first foundation model-enabled one-shot landmark detection (FM-OSD) framework for accurate landmark detection in medical images by utilizing solely a single template image without any additional unlabeled data. Specifically, we use the frozen image encoder of visual foundation models as the feature extractor, and introduce dual-branch global and local feature decoders to increase the resolution of extracted features in a coarse to fine manner. The introduced feature decoders are efficiently trained with a distance-aware similarity learning loss to incorporate domain knowledge from the single template image. Moreover, a novel bidirectional matching strategy is developed to improve both robustness and accuracy of landmark detection in the case of scattered similarity map obtained by foundation models. We validate our method on two public anatomical landmark detection datasets. By using solely a single template image, our method demonstrates significant superiority over strong state-of-the-art one-shot landmark detection methods.
- Asia > China > Hong Kong (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Kwon, Gihyun, Jenni, Simon, Li, Dingzeyu, Lee, Joon-Young, Ye, Jong Chul, Heilbron, Fabian Caba
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with the semantics of input prompts, and then personalizing the template using a concept fusion strategy. The fusion strategy incorporates the appearance of the target concepts into the template image while retaining its structural details. The results indicate that our method can generate multiple custom concepts with higher identity fidelity compared to alternative approaches. Furthermore, the method is shown to seamlessly handle more than two concepts and closely follow the semantic meaning of the input prompt without blending appearances across different subjects.
Scale-free vision-based aerial control of a ground formation with hybrid topology
Aranda, Miguel, Mezouar, Youcef, López-Nicolás, Gonzalo, Sagüés, Carlos
We present a novel vision-based control method to make a group of ground mobile robots achieve a specified formation shape with unspecified size. Our approach uses multiple aerial control units equipped with downward-facing cameras, each observing a partial subset of the multirobot team. The units compute the control commands from the ground robots' image projections, using neither calibration nor scene scale information, and transmit them to the robots. The control strategy relies on the calculation of image similarity transformations, and we show it to be asymptotically stable if the overlaps between the subsets of controlled robots satisfy certain conditions. The presence of the supervisory units, which coordinate their motions to guarantee a correct control performance, gives rise to a hybrid system topology. All in all, the proposed system provides relevant practical advantages in simplicity and flexibility. Within the problem of controlling a team shape, our contribution lies in addressing several simultaneous challenges: the controller needs only partial information of the robotic group, does not use distance measurements or global reference frames, is designed for unicycle agents, and can accommodate topology changes. We present illustrative simulation results.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Spain > Aragón > Zaragoza Province > Zaragoza (0.04)
- (2 more...)
CartiMorph: a framework for automated knee articular cartilage morphometrics
Yao, Yongcheng, Zhong, Junru, Zhang, Liping, Khan, Sheheryar, Chen, Weitian
We introduce CartiMorph, a framework for automated knee articular cartilage morphometrics. It takes an image as input and generates quantitative metrics for cartilage subregions, including the percentage of full-thickness cartilage loss (FCL), mean thickness, surface area, and volume. CartiMorph leverages the power of deep learning models for hierarchical image feature representation. Deep learning models were trained and validated for tissue segmentation, template construction, and template-to-image registration. We established methods for surface-normal-based cartilage thickness mapping, FCL estimation, and rule-based cartilage parcellation. Our cartilage thickness map showed less error in thin and peripheral regions. We evaluated the effectiveness of the adopted segmentation model by comparing the quantitative metrics obtained from model segmentation and those from manual segmentation. The root-mean-squared deviation of the FCL measurements was less than 8%, and strong correlations were observed for the mean thickness (Pearson's correlation coefficient $\rho \in [0.82,0.97]$), surface area ($\rho \in [0.82,0.98]$) and volume ($\rho \in [0.89,0.98]$) measurements. We compared our FCL measurements with those from a previous study and found that our measurements deviated less from the ground truths. We observed superior performance of the proposed rule-based cartilage parcellation method compared with the atlas-based approach. CartiMorph has the potential to promote imaging biomarkers discovery for knee osteoarthritis.
- Asia > China > Hong Kong (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Natick (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Health & Medicine > Therapeutic Area > Rheumatology (1.00)
- Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.92)
Two-stage Joint Transductive and Inductive learning for Nuclei Segmentation
Ali, Hesham, Tondji, Idriss, Siam, Mennatullah
AI-assisted nuclei segmentation in histopathological images is a crucial task in the diagnosis and treatment of cancer diseases. It decreases the time required to manually screen microscopic tissue images and can resolve the conflict between pathologists during diagnosis. Deep Learning has proven useful in such a task. However, lack of labeled data is a significant barrier for deep learning-based approaches. In this study, we propose a novel approach to nuclei segmentation that leverages the available labelled and unlabelled data. The proposed method combines the strengths of both transductive and inductive learning, which have been previously attempted separately, into a single framework. Inductive learning aims at approximating the general function and generalizing to unseen test data, while transductive learning has the potential of leveraging the unlabelled test data to improve the classification. To the best of our knowledge, this is the first study to propose such a hybrid approach for medical image segmentation. Moreover, we propose a novel two-stage transductive inference scheme. We evaluate our approach on MoNuSeg benchmark to demonstrate the efficacy and potential of our method.
- South America > Peru > Lima Department > Lima Province > Lima (0.04)
- North America > Canada > Ontario (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (3 more...)
One-shot Localization and Segmentation of Medical Images with Foundation Models
Anand, Deepa, M, Gurunath Reddy, Singhal, Vanika, Shanbhag, Dattesh D., KS, Shriram, Patil, Uday, Bhushan, Chitresh, Manickam, Kavitha, Gui, Dawei, Mullick, Rakesh, Gopal, Avinash, Bhatia, Parminder, Kass-Hout, Taha
Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems on medical images. While many works have made a case for in-domain training, we show that the models trained on natural images can offer good performance on medical images across different modalities (CT,MR,Ultrasound) sourced from various manufacturers, over multiple anatomical regions (brain, thorax, abdomen, extremities), and on wide variety of tasks. Further, we leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation, achieving dice range of 62%-90% across tasks, using just one image as reference. We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg (Dice range 47%-80%) on most of the semantic segmentation tasks(six out of seven) across medical imaging modalities.
WarpPINN: Cine-MR image registration with physics-informed neural networks
López, Pablo Arratia, Mella, Hernán, Uribe, Sergio, Hurtado, Daniel E., Costabal, Francisco Sahli
Heart failure is typically diagnosed with a global function assessment, such as ejection fraction. However, these metrics have low discriminate power, failing to distinguish different types of this disease. Quantifying local deformations in the form of cardiac strain can provide helpful information, but it remains a challenge. In this work, we introduce WarpPINN, a physics-informed neural network to perform image registration to obtain local metrics of the heart deformation. We apply this method to cine magnetic resonance images to estimate the motion during the cardiac cycle. We inform our neural network of near-incompressibility of cardiac tissue by penalizing the jacobian of the deformation field. The loss function has two components: an intensity-based similarity term between the reference and the warped template images, and a regularizer that represents the hyperelastic behavior of the tissue. The architecture of the neural network allows us to easily compute the strain via automatic differentiation to assess cardiac activity. We use Fourier feature mappings to overcome the spectral bias of neural networks, allowing us to capture discontinuities in the strain field. We test our algorithm on a synthetic example and on a cine-MRI benchmark of 15 healthy volunteers. We outperform current methodologies both landmark tracking and strain estimation. We expect that WarpPINN will enable more precise diagnostics of heart failure based on local deformation information. Source code is available at https://github.com/fsahli/WarpPINN.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Chile > Valparaíso Region > Valparaíso Province > Valparaíso (0.04)
- North America > United States (0.04)
- (2 more...)
#002 Advanced Computer Vision - Motion Estimation With Optical Flow
Highlights: Techniques like Object Detection have enabled computers of today to detect object instances easily. However, tracking the motion of objects such as vehicles across all frames of a video, estimating their velocity, and predicting their motion requires an efficient method such as Optical Flow. In our previous posts, we provided a detailed explanation about two of the most common Optical Flow methods – the Lucas Kanade method and the Horn & Schunck method. In this tutorial post, we will go through the fundamentals of Optical Flow and study some of the advanced algorithms used in calculating Optical Flow. An important piece of information that common object detection techniques miss out, is the relationship between objects in two consecutive frames.