AITopics | checking

Collaborating Authors

checking

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents

Zhan, Simon Sinong, Liu, Yao, Wang, Philip, Wang, Zinan, Wang, Qineng, Ruan, Zhian, Shi, Xiangyu, Cao, Xinyu, Yang, Frank, Wang, Kangrui, Shao, Huajie, Li, Manling, Zhu, Qi

arXiv.org Artificial IntelligenceOct-16-2025

We present Sentinel, the first framework for formally evaluating the physical safety of Large Language Model(LLM-based) embodied agents across the semantic, plan, and trajectory levels. Unlike prior methods that rely on heuristic rules or subjective LLM judgments, Sentinel grounds practical safety requirements in formal temporal logic (TL) semantics that can precisely specify state invariants, temporal dependencies, and timing constraints. It then employs a multi-level verification pipeline where (i) at the semantic level, intuitive natural language safety requirements are formalized into TL formulas and the LLM agent's understanding of these requirements is probed for alignment with the TL formulas; (ii) at the plan level, high-level action plans and subgoals generated by the LLM agent are verified against the TL formulas to detect unsafe plans before execution; and (iii) at the trajectory level, multiple execution trajectories are merged into a computation tree and efficiently verified against physically-detailed TL specifications for a final safety check. We apply Sentinel in VirtualHome and ALFRED, and formally evaluate multiple LLM-based embodied agents against diverse safety requirements. Our experiments show that by grounding physical safety in temporal logic and applying verification methods across multiple levels, Sentinel provides a rigorous foundation for systematically evaluating LLM-based embodied agents in physical environments, exposing safety violations overlooked by previous methods and offering insights into their failure modes.

constraint, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.12985

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Automatic Building Code Review: A Case Study

Wan, Hanlong, Xu, Weili, Rosenberg, Michael, Zhang, Jian, Siddika, Aysha

arXiv.org Artificial IntelligenceOct-6-2025

Building officials, especially those in resource - constrained or rural jurisdictions, struggle with labor - intensive, error - prone, and costly manual reviews of design documents as projects scale in size and complexity. Widespread adoption of Building Information Modeling (BIM) and Large Language Models (LLMs) has created opportunities for automated code review (AC R) solutions . This study proposes a novel agent - driven framework that integrates BIM - based data extraction with automated verification using both re trieval - augmented generation (RAG) and Model Context Protocol (MCP) agent pipelines. The framework employs LLM - enabled agents to extract geometry, schedules, and system attributes from heterogeneous file types, which are then processed for building code checking via two complementary mechanisms: (i) direct API calls to DOE's COMcheck engine, providing deterministic and audit - ready outputs, and (ii) RAG - based reasoning over rule provisions, allowing flexible interpretation where coverage is incomplete or amb iguous . The framework was evaluated through case demonstrations, including automated extraction of geometric attributes (e.g., surface area, tilt, and insulation values), parsing of operational schedules, and design validation for lighting allowances under ASHRAE Standard 90.1 - 2022. Comparative performance tests across multiple large language models showed that Generative Pre - trained Transformer 4 Omni (GPT - 4o) achieved the best balance of efficiency and stability, while smaller models exhibited inconsistenc ies or failure s . Results confirm that MCP agent pipelines perform better than RAG reasoning pipelines on rigor and flexibility in workflows.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.02634

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (1.00)
Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Madireddy, Soumya, Gao, Lu, Din, Zia, Kim, Kinam, Senouci, Ahmed, Han, Zhe, Zhang, Yunpeng

arXiv.org Artificial IntelligenceJun-26-2025

This research addresses the time-consuming and error-prone nature of manual code compliance checking in Building Information Modeling (BIM) by introducing a Large Language Model (LLM)-driven approach to semi-automate this critical process. The developed system integrates LLMs such as GPT, Claude, Gemini, and Llama, with Revit software to interpret building codes, generate Python scripts, and perform semi-automated compliance checks within the BIM environment. Case studies on a single-family residential project and an office building project demonstrated the system's ability to reduce the time and effort required for compliance checks while improving accuracy. It streamlined the identification of violations, such as non-compliant room dimensions, material usage, and object placements, by automatically assessing relationships and generating actionable reports. Compared to manual methods, the system eliminated repetitive tasks, simplified complex regulations, and ensured reliable adherence to standards. By offering a comprehensive, adaptable, and cost-effective solution, this proposed approach offers a promising advancement in BIM-based compliance checking, with potential applications across diverse regulatory documents in construction projects.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.20551

Country:

North America > United States (0.29)
Asia (0.28)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FoldA: Computing Partial-Order Alignments Using Directed Net Unfoldings

Geurtjens, Douwe, Lu, Xixi

arXiv.org Artificial IntelligenceJun-11-2025

Conformance checking is a fundamental task of process mining, which quantifies the extent to which the observed process executions match a normative process model. The state-of-the-art approaches compute alignments by exploring the state space formed by the synchronous product of the process model and the trace. This often leads to state space explosion, particularly when the model exhibits a high degree of choice and concurrency. Moreover, as alignments inherently impose a sequential structure, they fail to fully represent the concurrent behavior present in many real-world processes. To address these limitations, this paper proposes a new technique for computing partial-order alignments {on the fly using directed Petri net unfoldings, named FoldA. We evaluate our technique on 485 synthetic model-log pairs and compare it against Astar- and Dijkstra-alignments on 13 real-life model-log pairs and 6 benchmark pairs. The results show that our unfolding alignment, although it requires more computation time, generally reduces the number of queued states and provides a more accurate representation of concurrency.

alignment, artificial intelligence, synchronous product, (16 more...)

arXiv.org Artificial Intelligence

2506.08627

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

GPU accelerated program synthesis: Enumerate semantics, not syntax!

Berger, Martin, Fijalkow, Nathanaël, Valizadeh, Mojtaba

arXiv.org Artificial IntelligenceApr-29-2025

Program synthesis is an umbrella term for generating programs and logical formulae from specifications. With the remarkable performance improvements that GPUs enable for deep learning, a natural question arose: can we also implement a search-based program synthesiser on GPUs to achieve similar performance improvements? In this article we discuss our insights on this question, based on recent works~. The goal is to build a synthesiser running on GPUs which takes as input positive and negative example traces and returns a logical formula accepting the positive and rejecting the negative traces. With GPU-friendly programming techniques -- using the semantics of formulae to minimise data movement and reduce data-dependent branching -- our synthesiser scales to significantly larger synthesis problems, and operates much faster than the previous CPU-based state-of-the-art. We believe the insights that make our approach GPU-friendly have wide potential for enhancing the performance of other formal methods (FM) workloads.

logic & formal reasoning, machine learning, specification, (19 more...)

arXiv.org Artificial Intelligence

2504.18943

Country:

Europe > France (0.14)
Europe > Luxembourg (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Posterior SBC: Simulation-Based Calibration Checking Conditional on Data

Säilynoja, Teemu, Schmitt, Marvin, Bürkner, Paul, Vehtari, Aki

arXiv.org Machine LearningFeb-5-2025

Simulation-based calibration checking (SBC) refers to the validation of an inference algorithm and model implementation through repeated inference on data simulated from a generative model. In the original and commonly used approach, the generative model uses parameters drawn from the prior, and thus the approach is testing whether the inference works for simulated data generated with parameter values plausible under that prior. This approach is natural and desirable when we want to test whether the inference works for a wide range of datasets we might observe. However, after observing data, we are interested in answering whether the inference works conditional on that particular data. In this paper, we propose posterior SBC and demonstrate how it can be used to validate the inference conditionally on observed data. We illustrate the utility of posterior SBC in three case studies: (1) A simple multilevel model; (2) a model that is governed by differential equations; and (3) a joint integrative neuroscience model which is approximated via amortized Bayesian inference with neural networks.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

2502.03279

Country:

Europe > Finland (0.04)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Cognitive Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

One Stack, Diverse Vehicles: Checking Safe Portability of Automated Driving Software

Nenchev, Vladislav

arXiv.org Artificial IntelligenceJan-30-2025

Integrating an automated driving software stack into vehicles with variable configuration is challenging, especially due to different hardware characteristics. Further, to provide software updates to a vehicle fleet in the field, the functional safety of every affected configuration has to be ensured. These additional demands for dependability and the increasing hardware diversity in automated driving make rigorous automatic analysis essential. This paper addresses this challenge by using formal portability checking of adaptive cruise controller code for different vehicle configurations. Given a formal specification of the safe behavior, models of target configurations are derived, which capture relevant effects of sensors, actuators and computing platforms. A corresponding safe set is obtained and used to check if the desired behavior is achievable on all targets. In a case study, portability checking of a traditional and a neural network controller are performed automatically within minutes for each vehicle hardware configuration. The check provides feedback for necessary adaptations of the controllers, thus, allowing rapid integration and testing of software or parameter changes.

artificial intelligence, machine learning, vehicle, (17 more...)

arXiv.org Artificial Intelligence

2501.18769

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.92)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Program Correctness through Self-Certification

Communications of the ACMJan-13-2025, 20:47:44 GMT

Programming is both an enjoyable and a difficult task. A seemingly small slip can introduce a serious error or create a security vulnerability. The need for, and importance of, program correctness was recognized early in the modern development of computing. Algorithms, on which programs are built, arose in the ancient world (and are commonly attributed to the Greeks). The word verification dates only to Medieval Latin; however, when Euclid introduced his algorithm for the greatest common divisor centuries earlier, he provided a proof sketch based on what we would today call inductive reasoning.13

artificial intelligence, certificate, verification, (15 more...)

Communications of the ACM

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

Add feedback

Turn-based Multi-Agent Reinforcement Learning Model Checking

Gross, Dennis

arXiv.org Artificial IntelligenceJan-6-2025

In this paper, we propose a novel approach for verifying the compliance of turn-based multi-agent reinforcement learning (TMARL) agents with complex requirements in stochastic multiplayer games. Our method overcomes the limitations of existing verification approaches, which are inadequate for dealing with TMARL agents and not scalable to large games with multiple agents. Our approach relies on tight integration of TMARL and a verification technique referred to as model checking. We demonstrate the effectiveness and scalability of our technique through experiments in different types of environments. Our experiments show that our method is suited to verify TMARL agents and scales better than naive monolithic model checking.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2501.03187

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ARCEAK: An Automated Rule Checking Framework Enhanced with Architectural Knowledge

Chen, Junyong, Wu, Ling-I, Chen, Minyu, Qian, Xiaoying, Zhu, Haoze, Zhang, Qiongfang, Li, Guoqiang

arXiv.org Artificial IntelligenceDec-10-2024

Automated Rule Checking (ARC) plays a crucial role in advancing the construction industry by addressing the laborious, inconsistent, and error-prone nature of traditional model review conducted by industry professionals. Manual assessment against intricate sets of rules often leads to significant project delays and expenses. In response to these challenges, ARC offers a promising solution to improve efficiency and compliance in design within the construction sector. However, the main challenge of ARC lies in translating regulatory text into a format suitable for computer processing. Current methods for rule interpretation require extensive manual labor, thereby limiting their practicality. To address this issue, our study introduces a novel approach that decomposes ARC into two distinct tasks: rule information extraction and verification code generation. Leveraging generative pre-trained transformers, our method aims to streamline the interpretation of regulatory texts and simplify the process of generating model compliance checking code. Through empirical evaluation and case studies, we showcase the effectiveness and potential of our approach in automating code compliance checking, enhancing the efficiency and reliability of construction projects.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.14735

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(10 more...)

Genre: Research Report > Promising Solution (0.86)

Industry: Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback