AITopics | pixel accuracy

Collaborating Authors

pixel accuracy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Real-Time Semantic Segmentation on FPGA for Autonomous Vehicles Using LMIINet with the CGRA4ML Framework

Hosseini, Amir Mohammad Khadem, Mirzakuchaki, Sattar

arXiv.org Artificial IntelligenceOct-28-2025

Semantic segmentation has emerged as a fundamental problem in computer vision, gaining particular importance in real-time applications such as autonomous driving. The main challenge is achieving high accuracy while operating under computational and hardware constraints. In this research, we present an FPGA-based implementation of real-time semantic segmentation leveraging the lightweight LMIINet architecture and the Coarse-Grained Reconfigurable Array for Machine Learning (CGRA4ML) hardware framework. The model was trained using Quantization-Aware Training (QAT) with 8-bit precision on the Cityscapes dataset, reducing memory footprint by a factor of four while enabling efficient fixed-point computations. Necessary modifications were applied to adapt the model to CGRA4ML constraints, including simplifying skip connections, employing hardware-friendly operations such as depthwise-separable and 1A-1 convolutions, and redesigning parts of the Flatten Transformer. Our implementation achieves approximately 90% pixel accuracy and 45% mean Intersection-over-Union (mIoU), operating in real-time at 20 frames per second (FPS) with 50.1 ms latency on the ZCU104 FPGA board. The results demonstrate the potential of CGRA4ML, with its flexibility in mapping modern layers and off-chip memory utilization for skip connections, provides a path for implementing advanced semantic segmentation networks on FPGA for real-time applications to outperform traditional GPU solutions in terms of power efficiency while maintaining competitive accuracy. The code for this project is publicly available at https://github.com/STAmirr/ cgra4ml_semantic_segmentation

machine learning, real time system, segmentation, (20 more...)

arXiv.org Artificial Intelligence

2510.22243

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.49)
Automobiles & Trucks (0.49)
Information Technology > Robotics & Automation (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.72)

Add feedback

Semantic segmentation with coarse annotations

de Jong, Jort, Holenderski, Mike

arXiv.org Artificial IntelligenceOct-20-2025

Semantic segmentation is the task of classifying each pixel in an image. Training a segmentation model achieves best results using annotated images, where each pixel is annotated with the corresponding class. When obtaining fine annotations is difficult or expensive, it may be possible to acquire coarse annotations, e.g. by roughly annotating pixels in an images leaving some pixels around the boundaries between classes unlabeled. Segmentation with coarse annotations is difficult, in particular when the objective is to optimize the alignment of boundaries between classes. This paper proposes a regularization method for models with an encoder-decoder architecture with superpixel based upsampling. It encourages the segmented pixels in the decoded image to be SLIC-superpixels, which are based on pixel color and position, independent of the segmentation annotation. The method is applied to FCN-16 fully con-volutional network architecture and evaluated on the SUIM, Cityscapes, and PanNuke data sets. It is shown that the boundary recall improves significantly compared to state-of-the-art models when trained on coarse annotations.

annotation, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.15756

Country: Europe (0.46)

Genre: Research Report (0.86)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ToonOut: Fine-tuned Background-Removal for Anime Characters

Muratori, Matteo, Seytre, Joël

arXiv.org Artificial IntelligenceSep-9-2025

While state-of-the-art background removal models excel at realistic imagery, they frequently underperform in specialized domains such as anime-style content, where complex features like hair and transparency present unique challenges. To address this limitation, we collected and annotated a custom dataset of 1,228 high-quality anime images of characters and objects, and fine-tuned the open-sourced BiRefNet model on this dataset. This resulted in marked improvements in background removal accuracy for anime-style images, increasing from 95.3% to 99.5% for our newly introduced Pixel Accuracy metric. We are open-sourcing the code, the fine-tuned model weights, as well as the dataset at: https://github.com/MatteoKartoon/BiRefNet.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.06839

Genre: Research Report (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

OmniUnet: A Multimodal Network for Unstructured Terrain Segmentation on Planetary Rovers Using RGB, Depth, and Thermal Imagery

Castilla-Arquillo, Raul, Perez-del-Pulgar, Carlos, Gerdes, Levin, Garcia-Cerezo, Alfonso, Olivares-Mendez, Miguel A.

arXiv.org Artificial IntelligenceAug-4-2025

Robot navigation in unstructured environments requires multimodal perception systems that can support safe navigation. Multimodality enables the integration of complementary information collected by different sensors. However, this information must be processed by machine learning algorithms specifically designed to leverage heterogeneous data. Furthermore, it is necessary to identify which sensor modalities are most informative for navigation in the target environment. In Martian exploration, thermal imagery has proven valuable for assessing terrain safety due to differences in thermal behaviour between soil types. This work presents OmniUnet, a transformer-based neural network architecture for semantic segmentation using RGB, depth, and thermal (RGB-D-T) imagery. A custom multimodal sensor housing was developed using 3D printing and mounted on the Martian Rover Testbed for Autonomy (MaRTA) to collect a multimodal dataset in the Bardenas semi-desert in northern Spain. This location serves as a representative environment of the Martian surface, featuring terrain types such as sand, bedrock, and compact soil. A subset of this dataset was manually labeled to support supervised training of the network. The model was evaluated both quantitatively and qualitatively, achieving a pixel accuracy of 80.37% and demonstrating strong performance in segmenting complex unstructured terrain. Inference tests yielded an average prediction time of 673 ms on a resource-constrained computer (Jetson Orin Nano), confirming its suitability for on-robot deployment. The software implementation of the network and the labeled dataset have been made publicly available to support future research in multimodal terrain perception for planetary robotics.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.0058

Country: Europe > Spain (0.25)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Comprehensive Study on Lumbar Disc Segmentation Techniques Using MRI Data

Salturk, Serkan, Sayin, Irem, Balci, Ibrahim Cem, Pamukcu, Taha Emre, Soydan, Zafer, Uvet, Huseyin

arXiv.org Artificial IntelligenceDec-25-2024

Lumbar disk segmentation is essential for diagnosing and curing spinal disorders by enabling precise detection of disk boundaries in medical imaging. The advent of deep learning has resulted in the development of many segmentation methods, offering differing levels of accuracy and effectiveness. This study assesses the effectiveness of several sophisticated deep learning architectures, including ResUnext, Ef3 Net, UNet, and TransUNet, for lumbar disk segmentation, highlighting key metrics like as Pixel Accuracy, Mean Intersection over Union (Mean IoU), and Dice Coefficient. The findings indicate that ResUnext achieved the highest segmentation accuracy, with a Pixel Accuracy of 0.9492 and a Dice Coefficient of 0.8425, with TransUNet following closely after. Filtering techniques somewhat enhanced the performance of most models, particularly Dense UNet, improving stability and segmentation quality. The findings underscore the efficacy of these models in lumbar disk segmentation and highlight potential areas for improvement.

artificial intelligence, machine learning, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2412.18894

Country:

Europe (0.29)
Asia > Middle East > Republic of Türkiye (0.16)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Raine, Scarlett, Marchant, Ross, Kusy, Brano, Maire, Frederic, Suenderhauf, Niko, Fischer, Tobias

arXiv.org Artificial IntelligenceApr-16-2024

Broad-scale marine surveys performed by underwater vehicles significantly increase the availability of coral reef imagery, however it is costly and time-consuming for domain experts to label images. Point label propagation is an approach used to leverage existing image data labeled with sparse point labels. The resulting augmented ground truth generated is then used to train a semantic segmentation model. Here, we first demonstrate that recent advances in foundation models enable generation of multi-species coral augmented ground truth masks using denoised DINOv2 features and K-Nearest Neighbors (KNN), without the need for any pre-training or custom-designed algorithms. For extremely sparsely labeled images, we propose a labeling regime based on human-in-the-loop principles, resulting in significant improvement in annotation efficiency: If only 5 point labels per image are available, our proposed human-in-the-loop approach improves on the state-of-the-art by 17.3% for pixel accuracy and 22.6% for mIoU; and by 10.6% and 19.1% when 10 point labels per image are available. Even if the human-in-the-loop labeling regime is not used, the denoised DINOv2 features with a KNN outperforms the prior state-of-the-art by 3.5% for pixel accuracy and 5.7% for mIoU (5 grid points). We also provide a detailed analysis of how point labeling style and the quantity of points per image affects the point label propagation quality and provide general recommendations on maximizing point label efficiency.

ground truth mask, point label, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2404.09406

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > Queensland (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.87)

Add feedback

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

Li, Xuewei, Wu, Tao, Qi, Zhongang, Wang, Gaoang, Shan, Ying, Li, Xi

arXiv.org Artificial IntelligenceJun-6-2023

As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation (PASS) gives complete scene perception based on an ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original $360^{\circ}$ data. Therefore, their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original $360^{\circ}$ data, respectively. Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude. Our code and supplementary material are available at https://github.com/TencentARC/SGAT4PASS.

artificial intelligence, machine learning, panoramic image, (16 more...)

arXiv.org Artificial Intelligence

2306.03403

Country:

North America > United States > Illinois (0.04)
Asia > Singapore (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.89)

Add feedback

Exploiting CNNs for Semantic Segmentation with Pascal VOC

Prakash, Sourabh, Shah, Priyanshi, Agrawal, Ashrya

arXiv.org Artificial IntelligenceMay-5-2023

In this paper, we present a comprehensive study on semantic segmentation with the Pascal VOC dataset. Here, we have to label each pixel with a class which in turn segments the entire image based on the objects/entities present. To tackle this, we firstly use a Fully Convolution Network (FCN) baseline which gave 71.31% pixel accuracy and 0.0527 mean IoU. We analyze its performance and working and subsequently address the issues in the baseline with three improvements - a) cosine annealing learning rate scheduler(pixel accuracy: 72.86%, IoU: 0.0529), b) data augmentation(pixel accuracy: 69.88%, IoU: 0.0585) c) class imbalance weights(pixel accuracy: 68.98%, IoU: 0.0596). Apart from these changes in training pipeline, we also explore three different architectures - a) Our proposed model - Advanced FCN (pixel accuracy: 67.20%, IoU: 0.0602) b) Transfer Learning with ResNet (Best performance) (pixel accuracy: 71.33%, IoU: 0.0926) c) U-Net(pixel accuracy: 72.15%, IoU: 0.0649). We observe that the improvements help in greatly improving the performance, as reflected both, in metrics and segmentation maps. Interestingly, we observe that among the improvements, dataset augmentation has the greatest contribution. Also, note that transfer learning model performs the best on the pascal dataset. We analyse the performance of these using loss, accuracy and IoU plots along with segmentation maps, which help us draw valuable insights about the working of the models.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.13216

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Structured Pruning for Multi-Task Deep Neural Networks

Garg, Siddhant, Zhang, Lijun, Guan, Hui

arXiv.org Artificial IntelligenceApr-13-2023

Although multi-task deep neural network (DNN) models have computation and storage benefits over individual single-task DNN models, they can be further optimized via model compression. Numerous structured pruning methods are already developed that can readily achieve speedups in single-task models, but the pruning of multi-task networks has not yet been extensively studied. In this work, we investigate the effectiveness of structured pruning on multi-task models. We use an existing single-task filter pruning criterion and also introduce an MTL-based filter pruning criterion for estimating the filter importance scores. We prune the model using an iterative pruning strategy with both pruning methods. We show that, with careful hyper-parameter tuning, architectures obtained from different pruning methods do not have significant differences in their performances across tasks when the number of parameters is similar. We also show that iterative structure pruning may not be the best way to achieve a well-performing pruned model because, at extreme pruning levels, there is a high drop in performance across all tasks. But when the same models are randomly initialized and re-trained, they show better results.

artificial intelligence, machine learning, pruning, (14 more...)

arXiv.org Artificial Intelligence

2304.0684

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Coronary Artery Segmentation from Intravascular Optical Coherence Tomography Using Deep Capsules

Balaji, Arjun, Kelsey, Lachlan, Majeed, Kamran, Schultz, Carl, Doyle, Barry

arXiv.org Machine LearningMar-12-2020

The segmentation and analysis of coronary arteries from intravascular optical coherence tomography (IVOCT) is an important aspect of diagnosing and managing coronary artery disease. However, automated, robust IVOCT image analysis tools are lacking. Current image processing methods are hindered by the time needed to generate these expert-labelled datasets and also the potential for bias during the analysis. Here we present a new deep learning method based on capsules to automatically produce lumen segmentations, built using a large IVOCT dataset of 12,011 images with ground-truth segmentations. This dataset contains images with both blood and light artefacts (22.8%), as well as noise from metallic (23.1%) and bioresorbable stents (2.5%). We trained our model on a dataset containing 9,608 images. We rigorously investigate design variations with respect to upsampling regimes and input selection and validate our deep learning model using 2,403 images. We show that our fully trained and optimized model achieves a mean Soft Dice Score of 97.11% (median of 98.2%), segments 200 IVOCT images in an acceptable timeframe of 12 seconds and outperforms current algorithms.

experiment, pixel accuracy, segmentation, (12 more...)

arXiv.org Machine Learning

2003.0608

Country:

Oceania > Australia > Western Australia > Perth (0.04)
North America > United States > Massachusetts > Middlesex County > Westford (0.04)
North America > Mexico > Sonora (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback