AITopics | polyp

Collaborating Authors

polyp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AlternatingMirrorDescent forConstrainedMin-MaxGames

Neural Information Processing SystemsFeb-12-2026, 11:56:01 GMT

An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint.

artificial intelligence, descent, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Game Theory (0.88)
Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models

Gautam, Sushant, Riegler, Michael A., Halvorsen, Pål

arXiv.org Artificial IntelligenceSep-3-2025

--We investigate fine-tuning Vision-Language Models (VLMs) for multi-task medical image understanding, focusing on detection, localization, and counting of findings in medical images. Our objective is to evaluate whether instruction-tuned VLMs can simultaneously improve these tasks, with the goal of enhancing diagnostic accuracy and efficiency. Using MedMulti-Points, a multimodal dataset with annotations from endoscopy (polyps and instruments) and microscopy (sperm cells), we reformulate each task into instruction-based prompts suitable for vision-language reasoning. Results show that multi-task training improves robustness and accuracy. For example, it reduces the Count Mean Absolute Error (MAE) and increases Matching Accuracy in the Counting + Pointing task. However, trade-offs emerge, such as more zero-case point predictions, indicating reduced reliability in edge cases despite overall performance gains. Our study highlights the potential of adapting general-purpose VLMs to specialized medical tasks via prompt-driven fine-tuning. This approach mirrors clinical workflows, where radiologists simultaneously localize, count, and describe findings - demonstrating how VLMs can learn composite diagnostic reasoning patterns. The model produces interpretable, structured outputs, offering a promising step toward explainable and versatile medical AI. Medical imaging comes with numerous challenges, such as detecting subtle abnormalities, processing images captured under varied conditions, and managing high data volumes, all of which complicate automated analysis [1].

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CBMS65348.2025.00090

2505.16647

Country: Europe > Norway (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(3 more...)

Add feedback

Diffusion-Based Data Augmentation for Medical Image Segmentation

Nazir, Maham, Aqeel, Muhammad, Setti, Francesco

arXiv.org Artificial IntelligenceAug-26-2025

W e propose DiffAug a novel framework that combines text-guided diffusion-based generation with automatic segmentation validation to address this challenge. Our proposed approach uses latent diffusion models conditioned on medical text descriptions and spatial masks to synthesize abnormalities via inpainting on normal images. Generated samples undergo dynamic quality validation through a latent-space segmentation network that ensures accurate localization while enabling single-step inference. The text prompts, derived from medical literature, guide the generation of diverse abnormality types without requiring manual annotation. Our validation mechanism filters synthetic samples based on spatial accuracy, maintaining quality while operating efficiently through direct latent estimation. Evaluated on three medical imaging benchmarks (CVC-ClinicDB, Kvasir-SEG, REFUGE2), our framework achieves state-of-the-art performance with 8-10% Dice improvements over baselines and reduces false negative rates by up to 28% for challenging cases like small polyps and flat lesions critical for early detection in screening applications.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2508.17844

Country: Europe (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Temporally-Aware Supervised Contrastive Learning for Polyp Counting in Colonoscopy

Parolari, Luca, Cherubini, Andrea, Ballan, Lamberto, Biffi, Carlo

arXiv.org Artificial IntelligenceJul-4-2025

Automated polyp counting in colonoscopy is a crucial step toward automated procedure reporting and quality control, aiming to enhance the cost-effectiveness of colonoscopy screening. Counting polyps in a procedure involves detecting and tracking polyps, and then clustering tracklets that belong to the same polyp entity. Existing methods for polyp counting rely on self-supervised learning and primarily leverage visual appearance, neglecting temporal relationships in both tracklet feature learning and clustering stages. In this work, we introduce a paradigm shift by proposing a supervised contrastive loss that incorporates temporally-aware soft targets. Our approach captures intra-polyp variability while preserving inter-polyp discriminability, leading to more robust clustering. Additionally, we improve tracklet clustering by integrating a temporal adjacency constraint, reducing false positive re-associations between visually similar but temporally distant tracklets. We train and validate our method on publicly available datasets and evaluate its performance with a leave-one-out cross-validation strategy. Results demonstrate a 2.2x reduction in fragmentation rate compared to prior approaches. Our results highlight the importance of temporal awareness in polyp counting, establishing a new state-of-the-art. Code is available at https://github.com/lparolari/temporally-aware-polyp-counting.

artificial intelligence, machine learning, polyp, (13 more...)

arXiv.org Artificial Intelligence

2507.02493

Country: Europe > Italy (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.95)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

He, Yili, Zhu, Yan, Fu, Peiyao, Yang, Ruijie, Chen, Tianyi, Wang, Zhihua, Li, Quanlin, Zhou, Pinghong, Yang, Xian, Wang, Shuo

arXiv.org Artificial IntelligenceMay-15-2025

Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis, but faces challenges including non-informative background images, complex medical terminology, and ambiguous multi-lesion descriptions. We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain. Endo-CLIP's three-stage framework--cleansing, attunement, and unification--addresses these challenges by: (1) removing background frames, (2) leveraging large language models (LLMs) to extract clinical attributes for fine-grained contrastive learning, and (3) employing patient-level cross-attention to resolve multi-polyp ambiguities. Extensive experiments demonstrate that Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification, paving the way for more accurate and clinically relevant endoscopic analysis. Code will be made publicly available on https://github.com/chrlott/

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.09435

Country:

Asia > China > Shanghai > Shanghai (0.06)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Austria > Vienna (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Prompt to Polyp: Medical Text-Conditioned Image Synthesis with Diffusion Models

Chaichuk, Mikhail, Gautam, Sushant, Hicks, Steven, Tutubalina, Elena

arXiv.org Artificial IntelligenceMay-13-2025

The generation of realistic medical images from text descriptions has significant potential to address data scarcity challenges in healthcare AI while preserving patient privacy. This paper presents a comprehensive study of text-to-image synthesis in the medical domain, comparing two distinct approaches: (1) fine-tuning large pre-trained latent diffusion models and (2) training small, domain-specific models. We introduce a novel model named MSDM, an optimized architecture based on Stable Diffusion that integrates a clinical text encoder, vari-ational autoencoder, and cross-attention mechanisms to better align medical text prompts with generated images. Our study compares two approaches: fine-tuning large pre-trained models (FLUX, Kandinsky) versus training compact domain-specific models (MSDM). Evaluation across colonoscopy (MedVQA-GI) and radiology (ROCOv2) datasets reveals that while large models achieve higher fidelity, our optimized MSDM delivers comparable quality with lower computational costs. Quantitative metrics and qualitative evaluations by medical experts reveal strengths and limitations of each approach.

diffusion model, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.05573

Country: Europe > Russia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis

Choudhuri, Anwesa, Gao, Zhongpai, Zheng, Meng, Planche, Benjamin, Chen, Terrence, Wu, Ziyan

arXiv.org Artificial IntelligenceApr-2-2025

Early detection, accurate segmentation, classification and tracking of polyps during colonoscopy are critical for preventing colorectal cancer. Many existing deep-learning-based methods for analyzing colonoscopic videos either require task-specific fine-tuning, lack tracking capabilities, or rely on domain-specific pre-training. In this paper, we introduce PolypSegTrack, a novel foundation model that jointly addresses polyp detection, segmentation, classification and unsupervised tracking in colonoscopic videos. Our approach leverages a novel conditional mask loss, enabling flexible training across datasets with either pixel-level segmentation masks or bounding box annotations, allowing us to bypass task-specific fine-tuning. Our unsupervised tracking module reliably associates polyp instances across frames using object queries, without relying on any heuristics. We leverage a robust vision foundation model backbone that is pre-trained unsupervisedly on natural images, thereby removing the need for domain-specific pre-training. Extensive experiments on multiple polyp benchmarks demonstrate that our method significantly outperforms existing state-of-the-art approaches in detection, segmentation, classification, and tracking.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.24108

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Toward a Human-Centered AI-assisted Colonoscopy System in Australia

Chen, Hsiang-Ting, Zhang, Yuan, Carneiro, Gustavo, Singh, Rajvinder

arXiv.org Artificial IntelligenceMar-15-2025

While AI-assisted colonoscopy promises improved colorectal cancer screening, its success relies on effective integration into clinical practice, not just algorithmic accuracy. This paper, based on an Australian field study (observations and gastroenterologist interviews), highlights a critical disconnect: current development prioritizes machine learning model performance, overlooking essential aspects of user interface design, workflow integration, and overall user experience. Industry interactions reveal a similar emphasis on data and algorithms. To realize AI's full potential, the HCI community must champion user-centered design, ensuring these systems are usable, support endoscopist expertise, and enhance patient outcomes.

artificial intelligence, colonoscopy, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.2079

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.06)
Oceania > Australia > South Australia > Adelaide (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.54)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Robust Polyp Detection and Diagnosis through Compositional Prompt-Guided Diffusion Models

Yu, Jia, Zhu, Yan, Fu, Peiyao, Chen, Tianyi, Huang, Junbo, Li, Quanlin, Zhou, Pinghong, Wang, Zhihua, Wu, Fei, Wang, Shuo, Yang, Xian

arXiv.org Artificial IntelligenceFeb-25-2025

Colorectal cancer (CRC) is a significant global health concern, and early detection through screening plays a critical role in reducing mortality. While deep learning models have shown promise in improving polyp detection, classification, and segmentation, their generalization across diverse clinical environments, particularly with out-of-distribution (OOD) data, remains a challenge. Multi-center datasets like PolypGen have been developed to address these issues, but their collection is costly and time-consuming. Traditional data augmentation techniques provide limited variability, failing to capture the complexity of medical images. Diffusion models have emerged as a promising solution for generating synthetic polyp images, but the image generation process in current models mainly relies on segmentation masks as the condition, limiting their ability to capture the full clinical context. To overcome these limitations, we propose a Progressive Spectrum Diffusion Model (PSDM) that integrates diverse clinical annotations-such as segmentation masks, bounding boxes, and colonoscopy reports-by transforming them into compositional prompts. These prompts are organized into coarse and fine components, allowing the model to capture both broad spatial structures and fine details, generating clinically accurate synthetic images. By augmenting training data with PSDM-generated samples, our model significantly improves polyp detection, classification, and segmentation. For instance, on the PolypGen dataset, PSDM increases the F1 score by 2.12% and the mean average precision by 3.09%, demonstrating superior performance in OOD scenarios and enhanced generalization.

annotation, dataset, polyp, (14 more...)

arXiv.org Artificial Intelligence

2502.17951

Country:

Asia > China > Shanghai > Shanghai (0.05)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.88)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Self-Supervised Learning for Pre-training Capsule Networks: Overcoming Medical Imaging Dataset Challenges

El-Shimy, Heba, Zantout, Hind, Lones, Michael A., Gayar, Neamat El

arXiv.org Artificial IntelligenceFeb-7-2025

Deep learning techniques are increasingly being adopted in diagnostic medical imaging. However, the limited availability of high-quality, large-scale medical datasets presents a significant challenge, often necessitating the use of transfer learning approaches. This study investigates self-supervised learning methods for pre-training capsule networks in polyp diagnostics for colon cancer. We used the PICCOLO dataset, comprising 3,433 samples, which exemplifies typical challenges in medical datasets: small size, class imbalance, and distribution shifts between data splits. Capsule networks offer inherent interpretability due to their architecture and inter-layer information routing mechanism. However, their limited native implementation in mainstream deep learning frameworks and the lack of pre-trained versions pose a significant challenge. This is particularly true if aiming to train them on small medical datasets, where leveraging pre-trained weights as initial parameters would be beneficial. We explored two auxiliary self-supervised learning tasks, colourisation and contrastive learning, for capsule network pre-training. We compared self-supervised pre-trained models against alternative initialisation strategies. Our findings suggest that contrastive learning and in-painting techniques are suitable auxiliary tasks for self-supervised learning in the medical domain. These techniques helped guide the model to capture important visual features that are beneficial for the downstream task of polyp classification, increasing its accuracy by 5.26% compared to other weight initialisation methods.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.04748

Country:

Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.85)
Health & Medicine > Therapeutic Area > Gastroenterology (0.66)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback