Goto

Collaborating Authors

 license plate


OV-PARTS: Towards Open-Vocabulary Part Segmentation (Supplementary Material)

Neural Information Processing Systems

The number of part queries is set to 50. SGD optimizer with the initial learning rate of 2e-2 and weight decay of 5e-4 is used. We sample 128 training samples for each object part class. The initial value of the learnable fusion weight is 0.5 . The total batch size is 8, and the training iterations amount to 40k.




Flock Uses Overseas Gig Workers to Build Its Surveillance AI

WIRED

An accidental leak revealed that Flock, which has cameras in thousands of US communities, is using workers in the Philippines to review and classify footage. Flock, the automatic license plate reader and AI-powered camera company, uses overseas workers from Upwork to train its machine learning algorithms, with training material telling workers how to review and categorize footage including images people and vehicles in the United States, according to material reviewed by 404 Media that was accidentally exposed by the company. The findings bring up questions about who exactly has access to footage collected by Flock surveillance cameras and where people reviewing the footage may be based. Flock has become a pervasive technology in the US, with its cameras present in thousands of communities that cops use every day to investigate things like carjackings. Local police have also performed numerous lookups for ICE in the system.


Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance

Dimidov, Valeriu, Hawlader, Faisal, Jafarnejad, Sasan, Frank, Raphaël

arXiv.org Artificial Intelligence

Economic constraints, limited availability of datasets for reproducibility and shortages of specialized expertise have long been recognized as key challenges to the adoption and advancement of predictive maintenance (PdM) in the automotive sector. Recent progress in large language models (LLMs) presents an opportunity to overcome these barriers and speed up the transition of PdM from research to industrial practice. Under these conditions, we explore the potential of LLM-based agents to support PdM cleaning pipelines. Specifically, we focus on maintenance logs, a critical data source for training well-performing machine learning (ML) models, but one often affected by errors such as typos, missing fields, near-duplicate entries, and incorrect dates. We evaluate LLM agents on cleaning tasks involving six distinct types of noise. Our findings show that LLMs are effective at handling generic cleaning tasks and offer a promising foundation for future industrial applications. While domain-specific errors remain challenging, these results highlight the potential for further improvements through specialized training and enhanced agentic capabilities.


Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8

Vargoorani, Zahra Ebrahimi, Ghoreyshi, Amir Mohammad, Suen, Ching Yee

arXiv.org Artificial Intelligence

Developing a highly accurate automatic license plate recognition system (ALPR) is challenging due to environmental factors such as lighting, rain, and dust. Additional difficulties include high vehicle speeds, varying camera angles, and low-quality or low-resolution images. ALPR is vital in traffic control, parking, vehicle tracking, toll collection, and law enforcement applications. This paper proposes a deep learning strategy using YOLOv8 for license plate detection and recognition tasks. This method seeks to enhance the performance of the model using datasets from Ontario, Quebec, California, and New York State. It achieved an impressive recall rate of 94% on the dataset from the Center for Pattern Recognition and Machine Intelligence (CENPARMI) and 91% on the UFPR-ALPR dataset. In addition, our method follows a semi-supervised learning framework, combining a small set of manually labeled data with pseudo-labels generated by Grounding DINO to train our detection model. Grounding DINO, a powerful vision-language model, automatically annotates many images with bounding boxes for license plates, thereby minimizing the reliance on labor-intensive manual labeling. By integrating human-verified and model-generated annotations, we can scale our dataset efficiently while maintaining label quality, which significantly enhances the training process and overall model performance. Furthermore, it reports character error rates for both datasets, providing additional insight into system performance.


OV-PARTS: Towards Open-V ocabulary Part Segmentation (Supplementary Material)

Neural Information Processing Systems

The number of part queries is set to 50. SGD optimizer with the initial learning rate of 2e-2 and weight decay of 5e-4 is used. We sample 128 training samples for each object part class. The initial value of the learnable fusion weight is 0.5 . The total batch size is 8, and the training iterations amount to 40k.



Inter-Class Relational Loss for Small Object Detection: A Case Study on License Plates

Ning, Dian, Han, Dong Seog

arXiv.org Artificial Intelligence

In one-stage multi-object detection tasks, various intersection over union (IoU)-based solutions aim at smooth and stable convergence near the targets during training. However, IoU-based losses fail to correctly update the gradient of small objects due to an extremely flat gradient. During the update of multiple objects, the learning of small objects' gradients suffers more because of insufficient gradient updates. Therefore, we propose an inter-class relational loss to efficiently update the gradient of small objects while not sacrificing the learning efficiency of other objects based on the simple fact that an object has a spatial relationship to another object (e.g., a car plate is attached to a car in a similar position). When the predicted car plate's bounding box is not within its car, a loss punishment is added to guide the learning, which is inversely proportional to the overlapped area of the car's and predicted car plate's bounding box. By leveraging the spatial relationship at the inter-class level, the loss guides small object predictions using larger objects and enhances latent information in deeper feature maps. In this paper, we present twofold contributions using license plate detection as a case study: (1) a new small vehicle multi-license plate dataset (SVMLP), featuring diverse real-world scenarios with high-quality annotations; and (2) a novel inter-class relational loss function designed to promote effective detection performance. We highlight the proposed ICR loss penalty can be easily added to existing IoU-based losses and enhance the performance. These contributions improve the standard mean Average Precision (mAP) metric, achieving gains of 10.3% and 1.6% in mAP$^{\text{test}}_{50}$ for YOLOv12-T and UAV-DETR, respectively, without any additional hyperparameter tuning. Code and dataset will be available soon.


Video-based Vehicle Surveillance in the Wild: License Plate, Make, and Model Recognition with Self Reflective Vision-Language Models

Parsa, Pouya, Li, Keya, Kockelman, Kara M., Choi, Seongjin

arXiv.org Artificial Intelligence

Automatic license plate recognition (ALPR) and vehicle make and model recognition underpin intelligent transportation systems, supporting law enforcement, toll collection, and post-incident investigation. Applying these methods to videos captured by handheld smartphones or non-static vehicle-mounted cameras presents unique challenges compared to fixed installations, including frequent camera motion, varying viewpoints, occlusions, and unknown road geometry. Traditional ALPR solutions, dependent on specialized hardware and handcrafted OCR pipelines, often degrade under these conditions. Recent advances in large vision-language models (VLMs) enable direct recognition of textual and semantic attributes from arbitrary imagery. This study evaluates the potential of VLMs for ALPR and makes and models recognition using monocular videos captured with handheld smartphones and non-static mounted cameras. The proposed license plate recognition pipeline filters to sharp frames, then sends a multimodal prompt to a VLM using several prompt strategies. Make and model recognition pipeline runs the same VLM with a revised prompt and an optional self-reflection module. In the self-reflection module, the model contrasts the query image with a reference from a 134-class dataset, correcting mismatches. Experiments on a smartphone dataset collected on the campus of the University of Texas at Austin, achieve top-1 accuracies of 91.67% for ALPR and 66.67% for make and model recognition. On the public UFPR-ALPR dataset, the approach attains 83.05% and 61.07%, respectively. The self-reflection module further improves results by 5.72% on average for make and model recognition. These findings demonstrate that VLMs provide a cost-effective solution for scalable, in-motion traffic video analysis.