Data Race Detection Using Large Language Models

Chen, Le, Ding, Xianzhong, Emani, Murali, Vanderbruggen, Tristan, Lin, Pei-hung, Liao, Chuanhua

Oct-3-2023–arXiv.org Artificial Intelligence

Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing data races.

data race detection, dataset, llm, (7 more...)

arXiv.org Artificial Intelligence

Oct-3-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Iowa > Story County
    - Ames (0.04)
  - Illinois > Cook County
    - Lemont (0.04)
    - Chicago (0.04)
  - Colorado > Denver County
    - Denver (0.05)
  - California
    - Merced County > Merced (0.14)
    - Alameda County > Livermore (0.04)
- Asia > Middle East
  - Iran > Tehran Province > Tehran (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Energy (0.68)
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)