AITopics | detection and segmentation

Collaborating Authors

detection and segmentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MUSE: Model-based Uncertainty-aware Similarity Estimation for zero-shot 2D Object Detection and Segmentation

Cho, Sungmin, Park, Sungbum, Oh, Insoo

arXiv.org Artificial IntelligenceOct-22-2025

In this work, we introduce MUSE (Model-based Uncertainty-aware Similarity Estimation), a training-free framework designed for model-based zero-shot 2D object detection and segmentation. MUSE leverages 2D multi-view templates rendered from 3D unseen objects and 2D object proposals extracted from input query images. In the embedding stage, it integrates class and patch embeddings, where the patch embeddings are normalized using generalized mean pooling (GeM) to capture both global and local representations efficiently. During the matching stage, MUSE employs a joint similarity metric that combines absolute and relative similarity scores, enhancing the robustness of matching under challenging scenarios. Finally, the similarity score is refined through an uncertainty-aware object prior that adjusts for proposal reliability. Without any additional training or fine-tuning, MUSE achieves state-of-the-art performance on the BOP Challenge 2025, ranking first across the Classic Core, H3, and Industrial tracks. These results demonstrate that MUSE offers a powerful and generalizable framework for zero-shot 2D object detection and segmentation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.17866

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

MRI Brain Tumor Detection with Computer Vision

Krolik, Jack, Lynn, Jake, Rudden, John Henry, Vremenko, Dmytro

arXiv.org Artificial IntelligenceOct-14-2025

This study explores the application of deep learning techniques in the automated detection and segmentation of brain tumors from MRI scans. We employ several machine learning models, including basic logistic regression, Convolutional Neural Networks (CNNs), and Residual Networks (ResNet) to classify brain tumors effectively. Additionally, we investigate the use of U-Net for semantic segmentation and EfficientDet for anchor-based object detection to enhance the localization and identification of tumors. Our results demonstrate promising improvements in the accuracy and efficiency of brain tumor diagnostics, underscoring the potential of deep learning in medical imaging and its significance in improving clinical outcomes.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2510.1025

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Real-Time Threaded Houbara Detection and Segmentation for Wildlife Conservation using Mobile Platforms

Saoud, Lyes Saad, Lesobre, Loic, Sorato, Enrico, Hussain, Irfan

arXiv.org Artificial IntelligenceOct-7-2025

Real-time animal detection and segmentation in natural environments are vital for wildlife conservation, enabling non-invasive monitoring through remote camera streams. However, these tasks remain challenging due to limited computational resources and the cryptic appearance of many species. We propose a mobile-optimized two-stage deep learning framework that integrates a Threading Detection Model (TDM) to parallelize YOLOv10-based detection and MobileSAM-based segmentation. Unlike prior YOLO+SAM pipelines, our approach improves real-time performance by reducing latency through threading. YOLOv10 handles detection while MobileSAM performs lightweight segmentation, both executed concurrently for efficient resource use. On the cryptic Houbara Bustard, a conservation-priority species, our model achieves mAP50 of 0.9627, mAP75 of 0.7731, mAP95 of 0.7178, and a MobileSAM mIoU of 0.7421. YOLOv10 operates at 43.7 ms per frame, confirming real-time readiness. We introduce a curated Houbara dataset of 40,000 annotated images to support model training and evaluation across diverse conditions. The code and dataset used in this study are publicly available on GitHub at https://github.com/LyesSaadSaoud/mobile-houbara-detseg. For interactive demos and additional resources, visit https://lyessaadsaoud.github.io/LyesSaadSaoud-Threaded-YOLO-SAM-Houbara.

machine learning, real time system, segmentation, (21 more...)

arXiv.org Artificial Intelligence

2510.03501

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(2 more...)

Add feedback

Weakly Supervised Intracranial Aneurysm Detection and Segmentation in MR angiography via Multi-task UNet with Vesselness Prior

Rainville, Erin, Rasoulian, Amirhossein, Rivaz, Hassan, Xiao, Yiming

arXiv.org Artificial IntelligenceAug-4-2025

Intracranial aneurysms (IAs) are abnormal dilations of cerebral blood vessels that, if ruptured, can lead to life-threatening consequences. However, their small size and soft contrast in radiological scans often make it difficult to perform accurate and efficient detection and morphological analyses, which are critical in the clinical care of the disorder . Furthermore, the lack of large public datasets with voxel-wise expert annotations pose challenges for developing deep learning algorithms to address the issues. Therefore, we proposed a novel weakly supervised 3D multi-task UNet that integrates vesselness priors to jointly perform aneurysm detection and segmentation in time-of-flight MR angiography (TOF-MRA). Specifically, to robustly guide IA detection and segmentation, we employ the popular Frangi's vesselness filter to derive soft cerebrovascular priors for both network input and an attention block to conduct segmentation from the decoder and detection from an auxiliary branch. W e train our model on the Lausanne dataset with coarse ground truth segmentation, and evaluate it on the test set with refined labels from the same database. T o further assess our model's generalizability, we also validate it externally on the ADAM dataset. Our results demonstrate the superior performance of the proposed technique over the SOTA techniques for aneurysm segmentation (Dice = 0.614, 95%HD =1.38mm) and detection (false positive rate = 1.47, sensitivity = 92.9%).

artificial intelligence, machine learning, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2508.00235

Country: Europe > Switzerland > Vaud > Lausanne (0.26)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation

Neural Information Processing SystemsMay-27-2025, 06:06:33 GMT

Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars. Objects are localized by matching them to prototypes, which are constructed by unsupervised image-wide object appearance aggregation.Due to potentially diverse object appearances, the existing approaches often lead to overgeneralization and false positive detections.Furthermore, the best-performing methods train object localization by a surrogate loss, that predicts a unit Gaussian at each object center. This loss is sensitive to annotation error, hyperparameters and does not directly optimize the detection task, leading to suboptimal counts.We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation in a unified architecture.GeCo robustly generalizes the prototypes across objects appearances through a novel dense object query formulation. In addition, a novel counting loss is proposed, that directly optimizes the detection task and avoids the issues of the standard surrogate loss. GeCo surpasses the leading few-shot detection-based counters by \sim 25\% in the total count MAE, achieves superior detection accuracy and sets a new solid state-of-the-art result across all low-shot counting setups.

detection and segmentation, low-shot, novel unified architecture, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

Liu, Moru, Dong, Hao, Kelly, Jessica, Fink, Olga, Trapp, Mario

arXiv.org Artificial IntelligenceMay-23-2025

Out-of-distribution (OOD) detection and segmentation are crucial for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. While prior research has primarily focused on unimodal image data, real-world applications are inherently multimodal, requiring the integration of multiple modalities for improved OOD detection. A key challenge is the lack of supervision signals from unknown data, leading to overconfident predictions on OOD samples. To address this challenge, we propose Feature Mixing, an extremely simple and fast method for multimodal outlier synthesis with theoretical support, which can be further optimized to help the model better distinguish between in-distribution (ID) and OOD data. Feature Mixing is modality-agnostic and applicable to various modality combinations. Additionally, we introduce CARLA-OOD, a novel multimodal dataset for OOD segmentation, featuring synthetic OOD objects across diverse scenes and weather conditions. Extensive experiments on SemanticKITTI, nuScenes, CARLA-OOD datasets, and the MultiOOD benchmark demonstrate that Feature Mixing achieves state-of-the-art performance with a $10 \times$ to $370 \times$ speedup. Our source code and dataset will be available at https://github.com/mona4399/FeatureMixing.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2505.16985

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.34)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis

Choudhuri, Anwesa, Gao, Zhongpai, Zheng, Meng, Planche, Benjamin, Chen, Terrence, Wu, Ziyan

arXiv.org Artificial IntelligenceApr-2-2025

Early detection, accurate segmentation, classification and tracking of polyps during colonoscopy are critical for preventing colorectal cancer. Many existing deep-learning-based methods for analyzing colonoscopic videos either require task-specific fine-tuning, lack tracking capabilities, or rely on domain-specific pre-training. In this paper, we introduce PolypSegTrack, a novel foundation model that jointly addresses polyp detection, segmentation, classification and unsupervised tracking in colonoscopic videos. Our approach leverages a novel conditional mask loss, enabling flexible training across datasets with either pixel-level segmentation masks or bounding box annotations, allowing us to bypass task-specific fine-tuning. Our unsupervised tracking module reliably associates polyp instances across frames using object queries, without relying on any heuristics. We leverage a robust vision foundation model backbone that is pre-trained unsupervisedly on natural images, thereby removing the need for domain-specific pre-training. Extensive experiments on multiple polyp benchmarks demonstrate that our method significantly outperforms existing state-of-the-art approaches in detection, segmentation, classification, and tracking.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.24108

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

ColonNet: A Hybrid Of DenseNet121 And U-NET Model For Detection And Segmentation Of GI Bleeding

Singh, Ayushman, Prakash, Sharad, Das, Aniket, Kushwaha, Nidhi

arXiv.org Artificial IntelligenceDec-6-2024

This study presents an integrated deep learning model for automatic detection and classification of Gastrointestinal bleeding in the frames extracted from Wireless Capsule Endoscopy (WCE) videos. The dataset has been released as part of Auto-WCBleedGen Challenge Version V2 hosted by the MISAHUB team. Our model attained the highest performance among 75 teams that took part in this competition. It aims to efficiently utilizes CNN based model i.e. DenseNet and UNet to detect and segment bleeding and non-bleeding areas in the real-world complex dataset. The model achieves an impressive overall accuracy of 80% which would surely help a skilled doctor to carry out further diagnostics.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.05216

Country:

Asia > India > Jharkhand > Ranchi (0.06)
North America (0.05)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (0.74)
Health & Medicine > Diagnostic Medicine > Imaging (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Deep Learning for Surgical Instrument Recognition and Segmentation in Robotic-Assisted Surgeries: A Systematic Review

Ahmed, Fatimaelzahraa Ali, Yousef, Mahmoud, Ahmed, Mariam Ali, Ali, Hasan Omar, Mahboob, Anns, Ali, Hazrat, Shah, Zubair, Aboumarzouk, Omar, Ansari, Abdulla Al, Balakrishnan, Shidin

arXiv.org Artificial IntelligenceNov-7-2024

Applying deep learning (DL) for annotating surgical instruments in robot-assisted minimally invasive surgeries (MIS) represents a significant advancement in surgical technology. This systematic review examines 48 studies that and advanced DL methods and architectures. These sophisticated DL models have shown notable improvements in the precision and efficiency of detecting and segmenting surgical tools. The enhanced capabilities of these models support various clinical applications, including real-time intraoperative guidance, comprehensive postoperative evaluations, and objective assessments of surgical skills. By accurately identifying and segmenting surgical instruments in video data, DL models provide detailed feedback to surgeons, thereby improving surgical outcomes and reducing complication risks. Furthermore, the application of DL in surgical education is transformative. The review underscores the significant impact of DL on improving the accuracy of skill assessments and the overall quality of surgical training programs. However, implementing DL in surgical tool detection and segmentation faces challenges, such as the need for large, accurately annotated datasets to train these models effectively. The manual annotation process is labor-intensive and time-consuming, posing a significant bottleneck. Future research should focus on automating the detection and segmentation process and enhancing the robustness of DL models against environmental variations. Expanding the application of DL models across various surgical specialties will be essential to fully realize this technology's potential. Integrating DL with other emerging technologies, such as augmented reality (AR), also offers promising opportunities to further enhance the precision and efficacy of surgical procedures.

application, segmentation, surgery, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10462-024-10979-w

2410.07269

Country:

Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > Scotland > Stirling > Stirling (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Raspberry PhenoSet: A Phenology-based Dataset for Automated Growth Detection and Yield Estimation

Jafary, Parham, Bazangeya, Anna, Pham, Michelle, Campbell, Lesley G., Saeedi, Sajad, Zareinia, Kourosh, Bougherara, Habiba

arXiv.org Artificial IntelligenceNov-1-2024

The future of the agriculture industry is intertwined with automation. Accurate fruit detection, yield estimation, and harvest time estimation are crucial for optimizing agricultural practices. These tasks can be carried out by robots to reduce labour costs and improve the efficiency of the process. To do so, deep learning models should be trained to perform knowledge-based tasks, which outlines the importance of contributing valuable data to the literature. In this paper, we introduce Raspberry PhenoSet, a phenology-based dataset designed for detecting and segmenting raspberry fruit across seven developmental stages. To the best of our knowledge, Raspberry PhenoSet is the first fruit dataset to integrate biology-based classification with fruit detection tasks, offering valuable insights for yield estimation and precise harvest timing. This dataset contains 1,853 high-resolution images, the highest quality in the literature, captured under controlled artificial lighting in a vertical farm. The dataset has a total of 6,907 instances of mask annotations, manually labelled to reflect the seven phenology stages. We have also benchmarked Raspberry PhenoSet using several state-of-the-art deep learning models, including YOLOv8, YOLOv10, RT-DETR, and Mask R-CNN, to provide a comprehensive evaluation of their performance on the dataset. Our results highlight the challenges of distinguishing subtle phenology stages and underscore the potential of Raspberry PhenoSet for both deep learning model development and practical robotic applications in agriculture, particularly in yield prediction and supply chain management. The dataset and the trained models are publicly available for future studies.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.00967

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > Toronto (0.04)
South America > Chile (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback