Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

Khan, Muhammad Tayyab, Yong, Zane, Chen, Lequn, Tan, Jun Ming, Feng, Wenhe, Moon, Seung Ki

arXiv.org Artificial Intelligence 

Accurate extraction of key information from 2D engineering drawings is crucial for high - precision manufacturing. Manual extraction is slow and labor - intensive, while traditional Optical Character Recognition (OCR) techniques often struggle with complex layouts and overlapping symbols, resulting in unstructured outputs . To address these challenges, this paper proposes a novel hybrid deep learning framework for structured information extraction by integrat ing an O riented B ounding B ox (OBB) detection model with a transformer - based document parsing model (Donut). An in - house annotated dataset is used to train YOLOv11 for detect ing nine key categories: Geometric Dimensioning and Tolerancing (GD&T), General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. Detected OBBs are cropped into image s and labeled to fine - tune Donut for structured JSON output. Fine - tuning strategies include a single model trained across all categories and category - specific models . Results show that the single model consistently outperforms category - specific ones across all evaluation metrics, achieving higher precision (94.77% for GD&T), recall (100% for most categories), and F1 score (97.3%), while reducing hallucination s (5.23%) . The proposed framework improves accuracy, reduces manual effort, and supports scalable deployment in precision - driven industries.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found