Goto

Collaborating Authors

 checking


Algorithms for dynamic scheduling in manufacturing, towards digital factories Improving Deadline Feasibility and Responsiveness via Temporal Networks

Hedea, Ioan

arXiv.org Artificial Intelligence

Modern manufacturing systems must meet hard delivery deadlines while coping with stochastic task durations caused by process noise, equipment variability, and human intervention. Traditional deterministic schedules break down when reality deviates from nominal plans, triggering costly last-minute repairs. This thesis combines offline constraint-programming (CP) optimisation with online temporal-network execution to create schedules that remain feasible under worst-case uncertainty. First, we build a CP model of the flexible job-shop with per-job deadline tasks and insert an optimal buffer $Δ^*$ to obtain a fully pro-active baseline. We then translate the resulting plan into a Simple Temporal Network with Uncertainty (STNU) and verify dynamic controllability, which guarantees that a real-time dispatcher can retime activities for every bounded duration realisation without violating resource or deadline constraints. Extensive Monte-Carlo simulations on the open Kacem~1--4 benchmark suite show that our hybrid approach eliminates 100\% of deadline violations observed in state-of-the-art meta-heuristic schedules, while adding only 3--5\% makespan overhead. Scalability experiments confirm that CP solve-times and STNU checks remain sub-second on medium-size instances. The work demonstrates how temporal-network reasoning can bridge the gap between proactive buffering and dynamic robustness, moving industry a step closer to truly digital, self-correcting factories.


SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents

Zhan, Simon Sinong, Liu, Yao, Wang, Philip, Wang, Zinan, Wang, Qineng, Ruan, Zhian, Shi, Xiangyu, Cao, Xinyu, Yang, Frank, Wang, Kangrui, Shao, Huajie, Li, Manling, Zhu, Qi

arXiv.org Artificial Intelligence

We present Sentinel, the first framework for formally evaluating the physical safety of Large Language Model(LLM-based) embodied agents across the semantic, plan, and trajectory levels. Unlike prior methods that rely on heuristic rules or subjective LLM judgments, Sentinel grounds practical safety requirements in formal temporal logic (TL) semantics that can precisely specify state invariants, temporal dependencies, and timing constraints. It then employs a multi-level verification pipeline where (i) at the semantic level, intuitive natural language safety requirements are formalized into TL formulas and the LLM agent's understanding of these requirements is probed for alignment with the TL formulas; (ii) at the plan level, high-level action plans and subgoals generated by the LLM agent are verified against the TL formulas to detect unsafe plans before execution; and (iii) at the trajectory level, multiple execution trajectories are merged into a computation tree and efficiently verified against physically-detailed TL specifications for a final safety check. We apply Sentinel in VirtualHome and ALFRED, and formally evaluate multiple LLM-based embodied agents against diverse safety requirements. Our experiments show that by grounding physical safety in temporal logic and applying verification methods across multiple levels, Sentinel provides a rigorous foundation for systematically evaluating LLM-based embodied agents in physical environments, exposing safety violations overlooked by previous methods and offering insights into their failure modes.


Automatic Building Code Review: A Case Study

Wan, Hanlong, Xu, Weili, Rosenberg, Michael, Zhang, Jian, Siddika, Aysha

arXiv.org Artificial Intelligence

Building officials, especially those in resource - constrained or rural jurisdictions, struggle with labor - intensive, error - prone, and costly manual reviews of design documents as projects scale in size and complexity. Widespread adoption of Building Information Modeling (BIM) and Large Language Models (LLMs) has created opportunities for automated code review (AC R) solutions . This study proposes a novel agent - driven framework that integrates BIM - based data extraction with automated verification using both re trieval - augmented generation (RAG) and Model Context Protocol (MCP) agent pipelines. The framework employs LLM - enabled agents to extract geometry, schedules, and system attributes from heterogeneous file types, which are then processed for building code checking via two complementary mechanisms: (i) direct API calls to DOE's COMcheck engine, providing deterministic and audit - ready outputs, and (ii) RAG - based reasoning over rule provisions, allowing flexible interpretation where coverage is incomplete or amb iguous . The framework was evaluated through case demonstrations, including automated extraction of geometric attributes (e.g., surface area, tilt, and insulation values), parsing of operational schedules, and design validation for lighting allowances under ASHRAE Standard 90.1 - 2022. Comparative performance tests across multiple large language models showed that Generative Pre - trained Transformer 4 Omni (GPT - 4o) achieved the best balance of efficiency and stability, while smaller models exhibited inconsistenc ies or failure s . Results confirm that MCP agent pipelines perform better than RAG reasoning pipelines on rigor and flexibility in workflows.


Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Madireddy, Soumya, Gao, Lu, Din, Zia, Kim, Kinam, Senouci, Ahmed, Han, Zhe, Zhang, Yunpeng

arXiv.org Artificial Intelligence

This research addresses the time-consuming and error-prone nature of manual code compliance checking in Building Information Modeling (BIM) by introducing a Large Language Model (LLM)-driven approach to semi-automate this critical process. The developed system integrates LLMs such as GPT, Claude, Gemini, and Llama, with Revit software to interpret building codes, generate Python scripts, and perform semi-automated compliance checks within the BIM environment. Case studies on a single-family residential project and an office building project demonstrated the system's ability to reduce the time and effort required for compliance checks while improving accuracy. It streamlined the identification of violations, such as non-compliant room dimensions, material usage, and object placements, by automatically assessing relationships and generating actionable reports. Compared to manual methods, the system eliminated repetitive tasks, simplified complex regulations, and ensured reliable adherence to standards. By offering a comprehensive, adaptable, and cost-effective solution, this proposed approach offers a promising advancement in BIM-based compliance checking, with potential applications across diverse regulatory documents in construction projects.


FoldA: Computing Partial-Order Alignments Using Directed Net Unfoldings

Geurtjens, Douwe, Lu, Xixi

arXiv.org Artificial Intelligence

Conformance checking is a fundamental task of process mining, which quantifies the extent to which the observed process executions match a normative process model. The state-of-the-art approaches compute alignments by exploring the state space formed by the synchronous product of the process model and the trace. This often leads to state space explosion, particularly when the model exhibits a high degree of choice and concurrency. Moreover, as alignments inherently impose a sequential structure, they fail to fully represent the concurrent behavior present in many real-world processes. To address these limitations, this paper proposes a new technique for computing partial-order alignments {on the fly using directed Petri net unfoldings, named FoldA. We evaluate our technique on 485 synthetic model-log pairs and compare it against Astar- and Dijkstra-alignments on 13 real-life model-log pairs and 6 benchmark pairs. The results show that our unfolding alignment, although it requires more computation time, generally reduces the number of queued states and provides a more accurate representation of concurrency.


Posterior SBC: Simulation-Based Calibration Checking Conditional on Data

Säilynoja, Teemu, Schmitt, Marvin, Bürkner, Paul, Vehtari, Aki

arXiv.org Machine Learning

Simulation-based calibration checking (SBC) refers to the validation of an inference algorithm and model implementation through repeated inference on data simulated from a generative model. In the original and commonly used approach, the generative model uses parameters drawn from the prior, and thus the approach is testing whether the inference works for simulated data generated with parameter values plausible under that prior. This approach is natural and desirable when we want to test whether the inference works for a wide range of datasets we might observe. However, after observing data, we are interested in answering whether the inference works conditional on that particular data. In this paper, we propose posterior SBC and demonstrate how it can be used to validate the inference conditionally on observed data. We illustrate the utility of posterior SBC in three case studies: (1) A simple multilevel model; (2) a model that is governed by differential equations; and (3) a joint integrative neuroscience model which is approximated via amortized Bayesian inference with neural networks.


One Stack, Diverse Vehicles: Checking Safe Portability of Automated Driving Software

Nenchev, Vladislav

arXiv.org Artificial Intelligence

Integrating an automated driving software stack into vehicles with variable configuration is challenging, especially due to different hardware characteristics. Further, to provide software updates to a vehicle fleet in the field, the functional safety of every affected configuration has to be ensured. These additional demands for dependability and the increasing hardware diversity in automated driving make rigorous automatic analysis essential. This paper addresses this challenge by using formal portability checking of adaptive cruise controller code for different vehicle configurations. Given a formal specification of the safe behavior, models of target configurations are derived, which capture relevant effects of sensors, actuators and computing platforms. A corresponding safe set is obtained and used to check if the desired behavior is achievable on all targets. In a case study, portability checking of a traditional and a neural network controller are performed automatically within minutes for each vehicle hardware configuration. The check provides feedback for necessary adaptations of the controllers, thus, allowing rapid integration and testing of software or parameter changes.


Program Correctness through Self-Certification

Communications of the ACM

Programming is both an enjoyable and a difficult task. A seemingly small slip can introduce a serious error or create a security vulnerability. The need for, and importance of, program correctness was recognized early in the modern development of computing. Algorithms, on which programs are built, arose in the ancient world (and are commonly attributed to the Greeks). The word verification dates only to Medieval Latin; however, when Euclid introduced his algorithm for the greatest common divisor centuries earlier, he provided a proof sketch based on what we would today call inductive reasoning.13


Turn-based Multi-Agent Reinforcement Learning Model Checking

Gross, Dennis

arXiv.org Artificial Intelligence

In this paper, we propose a novel approach for verifying the compliance of turn-based multi-agent reinforcement learning (TMARL) agents with complex requirements in stochastic multiplayer games. Our method overcomes the limitations of existing verification approaches, which are inadequate for dealing with TMARL agents and not scalable to large games with multiple agents. Our approach relies on tight integration of TMARL and a verification technique referred to as model checking. We demonstrate the effectiveness and scalability of our technique through experiments in different types of environments. Our experiments show that our method is suited to verify TMARL agents and scales better than naive monolithic model checking.


ARCEAK: An Automated Rule Checking Framework Enhanced with Architectural Knowledge

Chen, Junyong, Wu, Ling-I, Chen, Minyu, Qian, Xiaoying, Zhu, Haoze, Zhang, Qiongfang, Li, Guoqiang

arXiv.org Artificial Intelligence

Automated Rule Checking (ARC) plays a crucial role in advancing the construction industry by addressing the laborious, inconsistent, and error-prone nature of traditional model review conducted by industry professionals. Manual assessment against intricate sets of rules often leads to significant project delays and expenses. In response to these challenges, ARC offers a promising solution to improve efficiency and compliance in design within the construction sector. However, the main challenge of ARC lies in translating regulatory text into a format suitable for computer processing. Current methods for rule interpretation require extensive manual labor, thereby limiting their practicality. To address this issue, our study introduces a novel approach that decomposes ARC into two distinct tasks: rule information extraction and verification code generation. Leveraging generative pre-trained transformers, our method aims to streamline the interpretation of regulatory texts and simplify the process of generating model compliance checking code. Through empirical evaluation and case studies, we showcase the effectiveness and potential of our approach in automating code compliance checking, enhancing the efficiency and reliability of construction projects.