Goto

Collaborating Authors

 part description


Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models

Liu, Yunqing, Zhang, Nan, Tan, Zhiming

arXiv.org Artificial Intelligence

Effective specification-aware part retrieval within complex CAD assemblies is essential for automated design verification and downstream engineering tasks. However, directly using LLMs/VLMs to this task presents some challenges: the input sequences may exceed model token limits, and even after processing, performance remains unsatisfactory. Moreover, fine-tuning LLMs/VLMs requires significant computational resources, and for many high-performing general-use proprietary models (e.g., GPT or Gemini), fine-tuning access is not available. In this paper, we propose a novel part retrieval framework that requires no extra training, but using Error Notebooks + RAG for refined prompt engineering to help improve the existing general model's retrieval performance. The construction of Error Notebooks consists of two steps: (1) collecting historical erroneous CoTs and their incorrect answers, and (2) connecting these CoTs through reflective corrections until the correct solutions are obtained. As a result, the Error Notebooks serve as a repository of tasks along with their corrected CoTs and final answers. RAG is then employed to retrieve specification-relevant records from the Error Notebooks and incorporate them into the inference process. Another major contribution of our work is a human-in-the-loop CAD dataset, which is used to evaluate our method. In addition, the engineering value of our novel framework lies in its ability to effectively handle 3D models with lengthy, non-natural language metadata. Experiments with proprietary models, including GPT-4o and the Gemini series, show substantial gains, with GPT-4o (Omni) achieving up to a 23.4% absolute accuracy improvement on the human preference dataset. Moreover, ablation studies confirm that CoT reasoning provides benefits especially in challenging cases with higher part counts (>10).


QueryCAD: Grounded Question Answering for CAD Models

Kienle, Claudius, Alt, Benjamin, Katic, Darko, Jäkel, Rainer

arXiv.org Artificial Intelligence

CAD models are widely used in industry and are essential for robotic automation processes. However, these models are rarely considered in novel AI-based approaches, such as the automatic synthesis of robot programs, as there are no readily available methods that would allow CAD models to be incorporated for the analysis, interpretation, or extraction of information. To address these limitations, we propose QueryCAD, the first system designed for CAD question answering, enabling the extraction of precise information from CAD models using natural language queries. QueryCAD incorporates SegCAD, an open-vocabulary instance segmentation model we developed to identify and select specific parts of the CAD model based on part descriptions. We further propose a CAD question answering benchmark to evaluate QueryCAD and establish a foundation for future research. Lastly, we integrate QueryCAD within an automatic robot program synthesis framework, validating its ability to enhance deep-learning solutions for robotics by enabling them to process CAD models (https://claudius-kienle.github.com/querycad).


Abstract Visual Reasoning with Tangram Shapes

Ji, Anya, Kojima, Noriyuki, Rush, Noah, Suhr, Alane, Vong, Wai Keen, Hawkins, Robert D., Artzi, Yoav

arXiv.org Artificial Intelligence

We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to include segmentation maps and part labels. We use this resource to evaluate the abstract visual reasoning capacities of recent multi-modal models. We observe that pre-trained weights demonstrate limited abstract reasoning, which dramatically improves with fine-tuning. We also observe that explicitly describing parts aids abstract reasoning for both humans and models, especially when jointly encoding the linguistic and visual inputs. KiloGram is available at https://lil.nlp.cornell.edu/kilogram .


Ergonomics Analysis for Vehicle Assembly Using Artificial Intelligence

Rychtyckyj, Nestor

AI Magazine

In this article I discuss a deployed application at Ford Motor Company that utilizes AI technology for the analysis of potential ergonomic concerns at Ford's assembly plants. The manufacture of motor vehicles is a complex and dynamic problem, and the costs related to workplace injuries and lost productivity due to bad ergonomic design can be very significant. Ford has developed two separate ergonomic analysis systems that have been integrated into the process planning for manufacturing system at Ford known as the Global Study and Process Allocation System (GSPAS). GSPAS has become the global repository for standardized engineering processes and data for assembling all Ford vehicles, including parts, tools, and standard labor time. One of the more significant benefits of GSPAS is the use of a controlled language, known as Standard Language, which is used throughout Ford to write the process assembly instructions. AI is already used within GSPAS for Standard Language validation and direct labor management. The work described here shows how Ford built upon its previous success with AI to expand the technology into the new domain of ergonomics analysis.