Automated Repair of AI Code with Large Language Models and Formal Verification
Charalambous, Yiannis, Manino, Edoardo, Cordeiro, Lucas C.
–arXiv.org Artificial Intelligence
In contrast to classic software development, neural networks are crafted via a long process of trial and error that terminates when their predictive performance reaches a satisfactory level [4, 36]. The iterative and performance-driven nature of this process leaves neural networks vulnerable on many fronts [23]: poor performance on out-of-distribution [21] and adversarial inputs [32], misspecification of the neural architecture and training process [24], invocation of broken and deprecated libraries [30], outright software bugs [22]. Unfortunately, many of these vulnerabilities are not easy to catch early in the development process and may remain hidden until after deployment. Although efforts to debug the actual implementation of neural networks exist, they are based on automatic testing and thus cannot prove correctness for all inputs [33, 22, 18]. This lack of guarantees is especially concerning for safety-critical systems since common software vulnerabilities [11] (e.g., arithmetic overflows, invalid memory accesses) can make the networks produce wrong results, expose sensitive data or corrupt the system they are executed on. In this report, we tackle the challenge of producing bug-free implementations of neural networks in the following way. First, we employ software verifiers to ensure full coverage of the state space. In the past, it has been claimed that software verifiers struggle to cope with neural network code due to its size, complexity and the presence of numerous calls to the standard mathematical library [26].
arXiv.org Artificial Intelligence
May-14-2024
- Country:
- South America > Brazil
- North America > United States
- New York > New York County
- New York City (0.04)
- California > Orange County
- Anaheim (0.04)
- New York > New York County
- Europe > United Kingdom
- England > Greater Manchester > Manchester (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Technology: