yolo11
Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition
Sapkota, Ranjan, Karkee, Manoj
This paper presents a comprehensive overview of the Ultralytics YOLO(You Only Look Once) family of object detectors, focusing the architectural evolution, benchmarking, deployment perspectives, and future challenges. The review begins with the most recent release, YOLO26 (or YOLOv26), which introduces key innovations including Distribution Focal Loss (DFL) removal, native NMS-free inference, Progressive Loss Balancing (ProgLoss), Small-Target-Aware Label Assignment (STAL), and the MuSGD optimizer for stable training. The progression is then traced through YOLO11, with its hybrid task assignment and efficiency-focused modules; YOLOv8, which advanced with a decoupled detection head and anchor-free predictions; and YOLOv5, which established the modular PyTorch foundation that enabled modern YOLO development. Benchmarking on the MS COCO dataset provides a detailed quantitative comparison of YOLOv5, YOLOv8, YOLO11, and YOLO26 (YOLOv26), alongside cross-comparisons with YOLOv12, YOLOv13, RT-DETR, and DEIM(DETR with Improved Matching). Metrics including precision, recall, F1 score, mean Average Precision, and inference speed are analyzed to highlight trade-offs between accuracy and efficiency. Deployment and application perspectives are further discussed, covering export formats, quantization strategies, and real-world use in robotics, agriculture, surveillance, and manufacturing. Finally, the paper identifies challenges and future directions, including dense-scene limitations, hybrid CNN-Transformer integration, open-vocabulary detection, and edge-aware training approaches. (Object Detection, YOLOv26, YOLO)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Asia > Nepal (0.04)
Vision transformer-based multi-camera multi-object tracking framework for dairy cow monitoring
Abbas, Kumail, Afzal, Zeeshan, Raza, Aqeel, Mansouri, Taha, Dowsey, Andrew W., Inchaisri, Chaidate, Alameer, Ali
Activity and behaviour correlate with dairy cow health and welfare, making continual and accurate monitoring crucial for disease identification and farm productivity. Manual observation and frequent assessments are laborious and inconsistent for activity monitoring. In this study, we developed a unique multi-camera, real-time tracking system for indoor-housed Holstein Friesian dairy cows. This technology uses cutting-edge computer vision techniques, including instance segmentation and tracking algorithms to monitor cow activity seamlessly and accurately. An integrated top-down barn panorama was created by geometrically aligning six camera feeds using homographic transformations. The detection phase used a refined YOLO11-m model trained on an overhead cow dataset, obtaining high accuracy (mAP\@0.50 = 0.97, F1 = 0.95). SAMURAI, an upgraded Segment Anything Model 2.1, generated pixel-precise cow masks for instance segmentation utilizing zero-shot learning and motion-aware memory. Even with occlusion and fluctuating posture, a motion-aware Linear Kalman filter and IoU-based data association reliably identified cows over time for object tracking. The proposed system significantly outperformed Deep SORT Realtime. Multi-Object Tracking Accuracy (MOTA) was 98.7% and 99.3% in two benchmark video sequences, with IDF1 scores above 99% and near-zero identity switches. This unified multi-camera system can track dairy cows in complex interior surroundings in real time, according to our data. The system reduces redundant detections across overlapping cameras, maintains continuity as cows move between viewpoints, with the aim of improving early sickness prediction through activity quantification and behavioural classification.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
- Europe > United Kingdom > England > Greater Manchester > Salford (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)
A Comprehensive Dataset for Underground Miner Detection in Diverse Scenario
Addy, Cyrus, Gurumadaiah, Ajay Kumar, Gao, Yixiang, Awuah-Offei, Kwame
Underground mining operations face significant safety challenges that make emergency response capabilities crucial. While robots have shown promise in assisting with search and rescue operations, their effectiveness depends on reliable miner detection capabilities. Deep learning algorithms offer potential solutions for automated miner detection, but require comprehensive training datasets, which are currently lacking for underground mining environments. This paper presents a novel thermal imaging dataset specifically designed to enable the development and validation of miner detection systems for potential emergency applications. We systematically captured thermal imagery of various mining activities and scenarios to create a robust foundation for detection algorithms. To establish baseline performance metrics, we evaluated several state-of-the-art object detection algorithms including YOLOv8, YOLOv10, YOLO11, and RT-DETR on our dataset. While not exhaustive of all possible emergency situations, this dataset serves as a crucial first step toward developing reliable thermal-based miner detection systems that could eventually be deployed in real emergency scenarios. This work demonstrates the feasibility of using thermal imaging for miner detection and establishes a foundation for future research in this critical safety application.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Missouri > Phelps County > Rolla (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (2 more...)
Exploring Model Quantization in GenAI-based Image Inpainting and Detection of Arable Plants
Modak, Sourav, Saltık, Ahmet Oğuz, Stein, Anthony
Deep learning-based weed control systems often suffer from limited training data diversity and constrained on-board computation, impacting their real-world performance. To overcome these challenges, we propose a framework that leverages Stable Diffusion-based inpainting to augment training data progressively in 10% increments -- up to an additional 200%, thus enhancing both the volume and diversity of samples. Our approach is evaluated on two state-of-the-art object detection models, YOLO11(l) and RT-DETR(l), using the mAP50 metric to assess detection performance. We explore quantization strategies (FP16 and INT8) for both the generative inpainting and detection models to strike a balance between inference speed and accuracy. Deployment of the downstream models on the Jetson Orin Nano demonstrates the practical viability of our framework in resource-constrained environments, ultimately improving detection accuracy and computational efficiency in intelligent weed management systems.
YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review
Hidayatullah, Priyanto, Syakrani, Nurjannah, Sholahuddin, Muhammad Rizqi, Gelar, Trisna, Tubagus, Refdinal
In the field of deep learning-based computer vision, YOLO is revolutionary. With respect to deep learning models, YOLO is also the one that is evolving the most rapidly. Unfortunately, not every YOLO model possesses scholarly publications. Moreover, there exists a YOLO model that lacks a publicly accessible official architectural diagram. Naturally, this engenders challenges, such as complicating the understanding of how the model operates in practice. Furthermore, the review articles that are presently available do not delve into the specifics of each model. The objective of this study is to present a comprehensive and in-depth architecture comparison of the four most recent YOLO models, specifically YOLOv8 through YOLO11, thereby enabling readers to quickly grasp not only how each model functions, but also the distinctions between them. To analyze each YOLO version's architecture, we meticulously examined the relevant academic papers, documentation, and scrutinized the source code. The analysis reveals that while each version of YOLO has improvements in architecture and feature extraction, certain blocks remain unchanged. The lack of scholarly publications and official diagrams presents challenges for understanding the model's functionality and future enhancement. Future developers are encouraged to provide these resources.
- Asia > Indonesia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Overview (1.00)
- Research Report (0.82)
DGNN-YOLO: Interpretable Dynamic Graph Neural Networks with YOLO11 for Detecting and Tracking Small Occluded Objects in Urban Traffic
Soudeep, Shahriar, Mridha, M. F., Jahin, Md Abrar, Dey, Nilanjan
The detection and tracking of small, occluded objects such as pedestrians, cyclists, and motorbikes pose significant challenges for traffic surveillance systems because of their erratic movement, frequent occlusion, and poor visibility in dynamic urban environments. Traditional methods like YOLO11, while proficient in spatial feature extraction for precise detection, often struggle with these small and dynamically moving objects, particularly in handling real-time data updates and resource efficiency. This paper introduces DGNN-YOLO, a novel framework that integrates dynamic graph neural networks (DGNNs) with YOLO11 to address these limitations. Unlike standard GNNs, DGNNs are chosen for their superior ability to dynamically update graph structures in real-time, which enables adaptive and robust tracking of objects in highly variable urban traffic scenarios. This framework constructs and regularly updates its graph representations, capturing objects as nodes and their interactions as edges, thus effectively responding to rapidly changing conditions. Additionally, DGNN-YOLO incorporates Grad-CAM, Grad-CAM++, and Eigen-CAM visualization techniques to enhance interpretability and foster trust, offering insights into the model's decision-making process. Extensive experiments validate the framework's performance, achieving a precision of 0.8382, recall of 0.6875, and mAP@0.5:0.95 of 0.6476, significantly outperforming existing methods. This study offers a scalable and interpretable solution for real-time traffic surveillance and significantly advances intelligent transportation systems' capabilities by addressing the critical challenge of detecting and tracking small, occluded objects.
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- (2 more...)
- Transportation > Infrastructure & Services (0.48)
- Transportation > Ground > Road (0.46)
- Information Technology > Sensing and Signal Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)
YOLOv11 for Vehicle Detection: Advancements, Performance, and Applications in Intelligent Transportation Systems
Vehicle detection is a critical component in the development of advanced intelligent transportation systems (ITS), which rely on accurate and real-time information to optimize traffic flow, enhance safety, and support autonomous vehicle technologies [1]. As the number of vehicles on the road continues to grow, the demand for robust vehicle detection systems capable of operating under varying conditions--such as changes in weather, lighting, and vehicle types--has become paramount. In traffic monitoring, vehicle detection enables the real-time analysis of traffic patterns, congestion management, and incident detection, contributing to more efficient urban mobility. Moreover, vehicle detection serves as the foundation for vehicle classification and tracking systems, which are essential for dynamic tolling, traffic law enforcement, and infrastructure planning [2]. The evolution of vehicle detection systems has been closely tied to advancements in deep learning, particularly in the field of convolutional neural networks (CNNs) [3]. CNNs have played a pivotal role in object detection tasks due to their ability to automatically learn hierarchical features from raw image data [4, 5, 6, 7, 8]. Traditional vehicle detection approaches, such as histogram of oriented gradients (HOG)[9] and support vector machines (SVM)[10], lacked the flexibility and scalability needed for modern applications, especially when dealing with complex scenes and varying environmental conditions. Subsequent methods, like Scale-Invariant Feature Transform (SIFT) and Speeded-Up Robust Features (SURF) [11], introduced improvements in detecting and describing features under varying scale and rotation conditions, although computational constraints limited real-time applicability for ITS.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)