AITopics | yolov8 model

Collaborating Authors

yolov8 model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Analytical Framework to Enhance Autonomous Vehicle Perception for Smart Cities

Khan, Jalal, Khan, Manzoor, Turaev, Sherzod, Malik, Sumbal, El-Sayed, Hesham, Ullah, Farman

arXiv.org Artificial IntelligenceOct-16-2025

The driving environment perception has a vital role for autonomous driving and nowadays has been actively explored for its realization. The research community and relevant stakeholders necessitate the development of Deep Learning (DL) models and AI-enabled solutions to enhance autonomous vehicles (AVs) for smart mobility. There is a need to develop a model that accurately perceives multiple objects on the road and predicts the driver's perception to control the car's movements. This article proposes a novel utility-based analytical model that enables perception systems of AVs to understand the driving environment. The article consists of modules: acquiring a custom dataset having distinctive objects, i.e., motorcyclists, rickshaws, etc; a DL-based model (YOLOv8s) for object detection; and a module to measure the utility of perception service from the performance values of trained model instances. The perception model is validated based on the object detection task, and its process is benchmarked by state-of-the-art deep learning models' performance metrics from the nuScense dataset. The experimental results show three best-performing YOLOv8s instances based on mAP@0.5 values, i.e., SGD-based (0.832), Adam-based (0.810), and AdamW-based (0.822). However, the AdamW-based model (i.e., car: 0.921, motorcyclist: 0.899, truck: 0.793, etc.) still outperforms the SGD-based model (i.e., car: 0.915, motorcyclist: 0.892, truck: 0.781, etc.) because it has better class-level performance values, confirmed by the proposed perception model. We validate that the proposed function is capable of finding the right perception for AVs. The results above encourage using the proposed perception model to evaluate the utility of learning models and determine the appropriate perception for AVs.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.1323

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN

Nawarathne, UMMPK, Kumari, HMNS, Kumari, HMLS

arXiv.org Artificial IntelligenceJul-28-2025

Underwater pollution is one of today's most significant environmental concerns, with vast volumes of garbage found in seas, rivers, and landscapes around the world. Accurate detection of these waste materials is crucial for successful waste management, environmental monitoring, and mitigation strategies. In this study, we investigated the performance of five cutting-edge object recognition algorithms, namely YOLO (You Only Look Once) models, including YOLOv7, YOLOv8, YOLOv9, YOLOv10, and Faster Region-Convolutional Neural Network (R-CNN), to identify which model was most effective at recognizing materials in underwater situations. The models were thoroughly trained and tested on a large dataset containing fifteen different classes under diverse conditions, such as low visibility and variable depths. From the above-mentioned models, YOLOv8 outperformed the others, with a mean Average Precision (mAP) of 80.9%, indicating a significant performance. This increased performance is attributed to YOLOv8's architecture, which incorporates advanced features such as improved anchor-free mechanisms and self-supervised learning, allowing for more precise and efficient recognition of items in a variety of settings. These findings highlight the YOLOv8 model's potential as an effective tool in the global fight against pollution, improving both the detection capabilities and scalability of underwater cleanup operations.

artificial intelligence, faster r-cnn, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.18967

Country: Asia (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Water & Waste Management > Solid Waste Management (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Defect Detection in Photolithographic Patterns Using Deep Learning Models Trained on Synthetic Data

Shinde, Prashant P., Pai, Priyadarshini P., Adiga, Shashishekar P., Mayya, K. Subramanya, Seo, Yongbeom, Hwang, Myungsoo, Go, Heeyoung, Park, Changmin

arXiv.org Artificial IntelligenceMay-16-2025

In the photolithographic process vital to semiconductor manufacturing, various types of defects appear during EUV pattering. Due to ever-shrinking pattern size, these defects are extremely small and cause false or missed detection during inspection. Specifically, the lack of defect-annotated quality data with good representation of smaller defects has prohibited deployment of deep learning based defect detection models in fabrication lines. To resolve the problem of data unavailability, we artificially generate scanning electron microscopy (SEM) images of line patterns with known distribution of defects and autonomously annotate them. We then employ state-of-the-art object detection models to investigate defect detection performance as a function of defect size, much smaller than the pitch width. We find that the real-time object detector YOLOv8 has the best mean average precision of 96% as compared to EfficientNet, 83%, and SSD, 77%, with the ability to detect smaller defects. We report the smallest defect size that can be detected reliably. When tested on real SEM data, the YOLOv8 model correctly detected 84.6% of Bridge defects and 78.3% of Break defects across all relevant instances. These promising results suggest that synthetic data can be used as an alternative to real-world data in order to develop robust machine-learning models.

artificial intelligence, defect, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.10192

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
South America > Brazil > Rio de Janeiro > Niterói (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Hardware (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Identifying Cocoa Pollinators: A Deep Learning Dataset

Xu, Wenxiu, Bazegar, Saba Ghorbani, Sheng, Dong, Toledo-Hernandez, Manuel, Lan, ZhenZhong, Wanger, Thomas Cherico

arXiv.org Artificial IntelligenceDec-27-2024

Cocoa is a multi-billion-dollar industry but research on improving yields through pollination remains limited. New embedded hardware and AI-based data analysis is advancing information on cocoa flower visitors, their identity and implications for yields. We present the first cocoa flower visitor dataset containing 5,792 images of Ceratopogonidae, Formicidae, Aphididae, Araneae, and Encyrtidae, and 1,082 background cocoa flower images. This dataset was curated from 23 million images collected over two years by embedded cameras in cocoa plantations in Hainan province, China. We exemplify the use of the dataset with different sizes of YOLOv8 models and by progressively increasing the background image ratio in the training set to identify the best-performing model. The medium-sized YOLOv8 model achieved the best results with 8% background images (F1 Score of 0.71, mAP50 of 0.70). Overall, this dataset is useful to compare the performance of deep learning model architectures on images with low contrast images and difficult detection targets. The data can support future efforts to advance sustainable cocoa production through pollination monitoring projects.

artificial intelligence, flower visitor, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.19915

Country:

Asia > China > Hainan Province (0.34)
Asia > China > Zhejiang Province > Hangzhou (0.05)
South America > Brazil > Pará > Belém (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Food & Agriculture > Agriculture (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Object Detection Approaches to Identifying Hand Images with High Forensic Values

Nguyen, Thanh Thi, Wilson, Campbell, Khan, Imad, Dalins, Janis

arXiv.org Artificial IntelligenceDec-20-2024

Forensic science plays a crucial role in legal investigations, and the use of advanced technologies, such as object detection based on machine learning methods, can enhance the efficiency and accuracy of forensic analysis. Human hands are unique and can leave distinct patterns, marks, or prints that can be utilized for forensic examinations. This paper compares various machine learning approaches to hand detection and presents the application results of employing the best-performing model to identify images of significant importance in forensic contexts. We fine-tune YOLOv8 and vision transformer-based object detection models on four hand image datasets, including the 11k hands dataset with our own bounding boxes annotated by a semi-automatic approach. Two YOLOv8 variants, i.e., YOLOv8 nano (YOLOv8n) and YOLOv8 extra-large (YOLOv8x), and two vision transformer variants, i.e., DEtection TRansformer (DETR) and Detection Transformers with Assignment (DETA), are employed for the experiments. Experimental results demonstrate that the YOLOv8 models outperform DETR and DETA on all datasets. The experiments also show that YOLOv8 approaches result in superior performance compared with existing hand detection methods, which were based on YOLOv3 and YOLOv4 models. Applications of our fine-tuned YOLOv8 models for identifying hand images (or frames in a video) with high forensic values produce excellent results, significantly reducing the time required by forensic experts. This implies that our approaches can be implemented effectively for real-world applications in forensics or related fields.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.16431

Country: Oceania > Australia (0.46)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Generalization Performance of YOLOv8 for Camera Trap Object Detection

Subedi, Aroj

arXiv.org Artificial IntelligenceDec-17-2024

Camera traps have become integral tools in wildlife conservation, providing non-intrusive means to monitor and study wildlife in their natural habitats. The utilization of object detection algorithms to automate species identification from Camera Trap images is of huge importance for research and conservation purposes. However, the generalization issue, where the trained model is unable to apply its learnings to a never-before-seen dataset, is prevalent. This thesis explores the enhancements made to the YOLOv8 object detection algorithm to address the problem of generalization. The study delves into the limitations of the baseline YOLOv8 model, emphasizing its struggles with generalization in real-world environments. To overcome these limitations, enhancements are proposed, including the incorporation of a Global Attention Mechanism (GAM) module, modified multi-scale feature fusion, and Wise Intersection over Union (WIoUv3) as a bounding box regression loss function. A thorough evaluation and ablation experiments reveal the improved model's ability to suppress the background noise, focus on object properties, and exhibit robust generalization in novel environments. The proposed enhancements not only address the challenges inherent in camera trap datasets but also pave the way for broader applicability in real-world conservation scenarios, ultimately aiding in the effective management of wildlife populations and habitats.

artificial intelligence, feature map, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.14211

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Poland (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Media > Photography (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Machine Learning-Based Automated Assessment of Intracorporeal Suturing in Laparoscopic Fundoplication

Khairnar, Shekhar Madhav, Nguyen, Huu Phong, Desir, Alexis, Holcomb, Carla, Scott, Daniel J., Sankaranarayanan, Ganesh

arXiv.org Artificial IntelligenceDec-16-2024

Automated assessment of surgical skills using artificial intelligence (AI) provides trainees with instantaneous feedback. After bimanual tool motions are captured, derived kinematic metrics are reliable predictors of performance in laparoscopic tasks. Implementing automated tool tracking requires time-intensive human annotation. We developed AI-based tool tracking using the Segment Anything Model (SAM) to eliminate the need for human annotators. Here, we describe a study evaluating the usefulness of our tool tracking model in automated assessment during a laparoscopic suturing task in the fundoplication procedure. An automated tool tracking model was applied to recorded videos of Nissen fundoplication on porcine bowel. Surgeons were grouped as novices (PGY1-2) and experts (PGY3-5, attendings). The beginning and end of each suturing step were segmented, and motions of the left and right tools were extracted. A low-pass filter with a 24 Hz cut-off frequency removed noise. Performance was assessed using supervised and unsupervised models, and an ablation study compared results. Kinematic features--RMS velocity, RMS acceleration, RMS jerk, total path length, and Bimanual Dexterity--were extracted and analyzed using Logistic Regression, Random Forest, Support Vector Classifier, and XGBoost. PCA was performed for feature reduction. For unsupervised learning, a Denoising Autoencoder (DAE) model with classifiers, such as a 1-D CNN and traditional models, was trained. Data were extracted for 28 participants (9 novices, 19 experts). Supervised learning with PCA and Random Forest achieved an accuracy of 0.795 and an F1 score of 0.778. The unsupervised 1-D CNN achieved superior results with an accuracy of 0.817 and an F1 score of 0.806, eliminating the need for kinematic feature computation. We demonstrated an AI model capable of automated performance classification, independent of human annotation.

artificial intelligence, assessment, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.16195

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Enhancing Bidirectional Sign Language Communication: Integrating YOLOv8 and NLP for Real-Time Gesture Recognition & Translation

Bhuiyan, Hasnat Jamil, Mozumder, Mubtasim Fuad, Khan, Md. Rabiul Islam, Ahmed, Md. Sabbir, Nahim, Nabuat Zaman

arXiv.org Artificial IntelligenceNov-18-2024

The primary concern of this research is to take American Sign Language (ASL) data through real time camera footage and be able to convert the data and information into text. Adding to that, we are also putting focus on creating a framework that can also convert text into sign language in real time which can help us break the language barrier for the people who are in need. In this work, for recognising American Sign Language (ASL), we have used the You Only Look Once(YOLO) model and Convolutional Neural Network (CNN) model. YOLO model is run in real time and automatically extracts discriminative spatial-temporal characteristics from the raw video stream without the need for any prior knowledge, eliminating design flaws. The CNN model here is also run in real time for sign language detection. We have introduced a novel method for converting text based input to sign language by making a framework that will take a sentence as input, identify keywords from that sentence and then show a video where sign language is performed with respect to the sentence given as input in real time. To the best of our knowledge, this is a rare study to demonstrate bidirectional sign language communication in real time in the American Sign Language (ASL).

machine learning, natural language, real time system, (18 more...)

arXiv.org Artificial Intelligence

2411.13597

Country: Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)

Genre: Research Report > Promising Solution (0.34)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Mero Nagarikta: Advanced Nepali Citizenship Data Extractor with Deep Learning-Powered Text Detection and OCR

Dhakal, Sisir, Sigdel, Sujan, Paudel, Sandesh Prasad, Ranabhat, Sharad Kumar, Lamichhane, Nabin

arXiv.org Artificial IntelligenceOct-8-2024

Transforming text-based identity documents, such as Nepali citizenship cards, into a structured digital format poses several challenges due to the distinct characteristics of the Nepali script and minor variations in print alignment and contrast across different cards. This work proposes a robust system using YOLOv8 for accurate text object detection and an OCR algorithm based on Optimized PyTesseract. The system, implemented within the context of a mobile application, allows for the automated extraction of important textual information from both the front and the back side of Nepali citizenship cards, including names, citizenship numbers, and dates of birth. The final YOLOv8 model was accurate, with a mean average precision of 99.1% for text detection on the front and 96.1% on the back. The tested PyTesseract optimized for Nepali characters outperformed the standard OCR regarding flexibility and accuracy, extracting text from images with clean and noisy backgrounds and various contrasts. Using preprocessing steps such as converting the images into grayscale, removing noise from the images, and detecting edges further improved the system's OCR accuracy, even for low-quality photos. This work expands the current body of research in multilingual OCR and document analysis, especially for low-resource languages such as Nepali. It emphasizes the effectiveness of combining the latest object detection framework with OCR models that have been fine-tuned for practical applications.

citizenship card, dataset, text detection, (13 more...)

arXiv.org Artificial Intelligence

2410.05721

Country:

Europe > Switzerland (0.04)
Asia > Nepal > Gandaki Province > Kaski District > Pokhara (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Precision Aquaculture: An Integrated Computer Vision and IoT Approach for Optimized Tilapia Feeding

Hossam, Rania, Heakl, Ahmed, Gomaa, Walid

arXiv.org Artificial IntelligenceSep-13-2024

Traditional fish farming practices often lead to inefficient feeding, resulting in environmental issues and reduced productivity. We developed an innovative system combining computer vision and IoT technologies for precise Tilapia feeding. Our solution uses real-time IoT sensors to monitor water quality parameters and computer vision algorithms to analyze fish size and count, determining optimal feed amounts. A mobile app enables remote monitoring and control. We utilized YOLOv8 for keypoint detection to measure Tilapia weight from length, achieving \textbf{94\%} precision on 3,500 annotated images. Pixel-based measurements were converted to centimeters using depth estimation for accurate feeding calculations. Our method, with data collection mirroring inference conditions, significantly improved results. Preliminary estimates suggest this approach could increase production up to 58 times compared to traditional farms. Our models, code, and dataset are open-source~\footnote{The code, dataset, and models are available upon reasonable request.

dataset, detection, estimation, (14 more...)

arXiv.org Artificial Intelligence

2409.08695

Country:

Africa > Middle East > Egypt > Alexandria Governorate > Alexandria (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Philippines (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Food & Agriculture > Fishing (1.00)
Water & Waste Management > Water Management > Water Supplies & Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback