Can LLMs Converse Formally? Automatically Assessing LLMs in Translating and Interpreting Formal Specifications

Karia, Rushang, Dobhal, Daksh, Bramblett, Daniel, Verma, Pulkit, Srivastava, Siddharth

Mar-27-2024–arXiv.org Artificial Intelligence

Automatic system synthesis and verification often require specifications to be provided in a formal language such as propositional logic [Haubelt and Feldmann, 2003, Scholl and Becker, 2001]. Typically, human experts serve as middlemen that can (a) translate natural language (NL) specifications of stakeholders to formal syntax, or (b) explain or interpret the system's functionality by translating the system manual into NL. Given the success of Large Language Models (LLMs) in translation tasks [Xue et al., 2021], utilizing LLMs as middlemen can help in reducing overall system design costs. Thus, it is vital to develop an evaluation methodology that can assess the capabilities of LLMs in such settings. However, developing such a methodology is quite difficult. Firstly, obtaining high-quality datasets - such as those that contain ground truth data that LLMs have not been trained on - is difficult. As LLMs evolve, the dataset would need to evolve as well since it would likely be included as a part of the next-gen LLMs training process. Scaling up existing datasets is challenging since they require human annotators to encode NL text and their formal specifications. Finally, the assessment task must consider both the directions of translation; formal-to-natural and natural-to-formal.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Mar-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Arizona (0.14)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.51)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found