Goto

Collaborating Authors

 counter example




AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation

Pulavarthi, Vaishnavi, Nandal, Deeksha, Dan, Soham, Pal, Debjit

arXiv.org Artificial Intelligence

Assertions have been the de facto collateral for simulation-based and formal verification of hardware designs for over a decade. The quality of hardware verification, \ie, detection and diagnosis of corner-case design bugs, is critically dependent on the quality of the assertions. There has been a considerable amount of research leveraging a blend of data-driven statistical analysis and static analysis to generate high-quality assertions from hardware design source code and design execution trace data. Despite such concerted effort, all prior research struggles to scale to industrial-scale large designs, generates too many low-quality assertions, often fails to capture subtle and non-trivial design functionality, and does not produce any easy-to-comprehend explanations of the generated assertions to understand assertions' suitability to different downstream validation tasks. Recently, with the advent of Large-Language Models (LLMs), there has been a widespread effort to leverage prompt engineering to generate assertions. However, there is little effort to quantitatively establish the effectiveness and suitability of various LLMs for assertion generation. In this paper, we present AssertionBench, a novel benchmark to evaluate LLMs' effectiveness for assertion generation quantitatively. AssertioBench contains 100 curated Verilog hardware designs from OpenCores and formally verified assertions for each design generated from GoldMine and HARM. We use AssertionBench to compare state-of-the-art LLMs to assess their effectiveness in inferring functionally correct assertions for hardware designs. Our experiments demonstrate how LLMs perform relative to each other, the benefits of using more in-context exemplars in generating a higher fraction of functionally correct assertions, and the significant room for improvement for LLM-based assertion generators.


TD Convergence: An Optimization Perspective

Asadi, Kavosh, Sabach, Shoham, Liu, Yao, Gottesman, Omer, Fakoor, Rasool

arXiv.org Artificial Intelligence

We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.


Refutation of Spectral Graph Theory Conjectures with Monte Carlo Search

Roucairol, Milo, Cazenave, Tristan

arXiv.org Artificial Intelligence

We demonstrate how Monte Carlo Search (MCS) algorithms, namely Nested Monte Carlo Search (NMCS) and Nested Rollout Policy Adaptation (NRPA), can be used to build graphs and find counter-examples to spectral graph theory conjectures in minutes.


OVERT: An Algorithm for Safety Verification of Neural Network Control Policies for Nonlinear Systems

Sidrane, Chelsea, Maleki, Amir, Irfan, Ahmed, Kochenderfer, Mykel J.

arXiv.org Artificial Intelligence

Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas from the classical formal methods literature with ideas from the newer neural network verification literature. The central concept of OVERT is to abstract nonlinear functions with a set of optimally tight piecewise linear bounds. Such piecewise linear bounds are designed for seamless integration into ReLU neural network verification tools. OVERT can be used to prove bounded-time safety properties by either computing reachable sets or solving feasibility queries directly. We demonstrate various examples of safety verification for several classical benchmark examples. OVERT compares favorably to existing methods both in computation time and in tightness of the reachable set.


Exploring Scale-Measures of Data Sets

Hanika, Tom, Hirth, Johannes

arXiv.org Artificial Intelligence

An inevitable step of any data-based knowledge discovery process is measurement [24] and the associated (explicit or implicit) scaling of the data [27]. The latter is particularly constrained by the underlying mathematical formulation of the data representation, e.g., real-valued vector spaces or weighted graphs, the requirements of the data procedures, e.g., the presence of a distance function, and, more recently, the need for human understanding of the results. Considering the scaling of data as part of the analysis itself, in particular formalizing it and thus making it controllable, is a salient feature of formal concept analysis (FCA) [7]. This field of research has spawned a variety of specialized scaling methods, such as logical scaling [25], and in the form of scale-measures links the scaling process with the study of continuous mappings between closure systems. Recent results by the authors [13] revealed that the set of all scale-measures for a given data set constitutes a lattice. Furthermore, it was shown that any scale-measure can be expressed in simple propositional terms using disjunction, conjunction and negation. Among other things, the previous results allow a computational transition between different scale-measures, which we may call scalemeasure navigation, as well as their interpretability by humans.


Testing Monotonicity of Machine Learning Models

Sharma, Arnab, Wehrheim, Heike

arXiv.org Artificial Intelligence

Today, machine learning (ML) models are increasingly applied in decision making. This induces an urgent need for quality assurance of ML models with respect to (often domain-dependent) requirements. Monotonicity is one such requirement. It specifies a software as 'learned' by an ML algorithm to give an increasing prediction with the increase of some attribute values. While there exist multiple ML algorithms for ensuring monotonicity of the generated model, approaches for checking monotonicity, in particular of black-box models, are largely lacking. In this work, we propose verification-based testing of monotonicity, i.e., the formal computation of test inputs on a white-box model via verification technology, and the automatic inference of this approximating white-box model from the black-box model under test. On the white-box model, the space of test inputs can be systematically explored by a directed computation of test cases. The empirical evaluation on 90 black-box models shows verification-based testing can outperform adaptive random testing as well as property-based techniques with respect to effectiveness and efficiency.


Algorithms for Verifying Deep Neural Networks

Liu, Changliu, Arnon, Tomer, Lazarus, Christopher, Barrett, Clark, Kochenderfer, Mykel J.

arXiv.org Machine Learning

Neural networks [15] have been widely used in many applications, such as image classification and understanding [17], language processing [24], and control of autonomous systems [26]. These networks represent functions that map inputs to outputs through a sequence of layers. At each layer, the input to that layer undergoes an affine transformation followed by a simple nonlinear transformation before being passed to the next layer. These nonlinear transformations are often called activation functions, and a common example is the rectified linear unit (ReLU), which transforms the input by setting any negative values to zero. Although the computation involved in a neural network is quite simple, these networks can represent complex nonlinear functions by appropriately choosing the matrices that define the affine transformations.


CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

Bhatt, Aditya, Argus, Max, Amiranashvili, Artemij, Brox, Thomas

arXiv.org Machine Learning

Off-policy Temporal Difference (TD) learning methods, when combined with function approximators, suffer from the risk of divergence, a phenomenon known as the deadly triad. It has long been noted that some feature representations work better than others. In this paper we investigate how feature normalization can prevent divergence and improve training. Our method, which we call CrossNorm, can be regarded as a new variant of batch normalization that re-centers data for multi-modal distributions, which occur in the off-policy TD updates. We show empirically that CrossNorm improves the stability of the learning process. We apply CrossNorm to DDPG and TD3 and achieve stable training and improved performance across a range of MuJoCo benchmark tasks. Moreover, for the first time, we are able to train DDPG stably without the use of target networks.