Shaker Heights
Revisiting Differential Verification: Equivalence Verification with Confidence
Teuber, Samuel, Kern, Philipp, Janzen, Marvin, Beckert, Bernhard
When validated neural networks (NNs) are pruned (and retrained) before deployment, it is desirable to prove that the new NN behaves equivalently to the (original) reference NN. To this end, our paper revisits the idea of differential verification which performs reasoning on differences between NNs: On the one hand, our paper proposes a novel abstract domain for differential verification admitting more efficient reasoning about equivalence. On the other hand, we investigate empirically and theoretically which equivalence properties are (not) efficiently solved using differential reasoning. Based on the gained insights, and following a recent line of work on confidence-based verification, we propose a novel equivalence property that is amenable to Differential Verification while providing guarantees for large parts of the input space instead of small-scale guarantees constructed w.r.t. predetermined input points. We implement our approach in a new tool called VeryDiff and perform an extensive evaluation on numerous old and new benchmark families, including new pruned NNs for particle jet classification in the context of CERN's LHC where we observe median speedups >300x over the State-of-the-Art verifier alpha,beta-CROWN.
Can Transformers Reason Logically? A Study in SAT Solving
Pan, Leyan, Ganesh, Vijay, Abernethy, Jacob, Esposo, Chris, Lee, Wenke
A PARAT "program" is basically a sequence of array operations over SOps. Throughout this section, we refer to the indices along the first dimension of an SOp as "position" and refer to indices along the second dimension as "dimension". The "inputs" to a program are arbitrary positional encoding and token embedding SOps, represented by the base class names PosEncSOp and TokEmbSOp respectively. For example, the OneHotTokEmb class represents the one-hot embedding of tokens and Indices represents the numerical value of the index of each position. The rest of the program performs various operations that compute new SOps based on existing ones. We provide implementations of basic building block operations including (but not limited to) the following: Mean(q, k, v) Represents the "Averaging Hard Attention" operation.
Rule Based Learning with Dynamic (Graph) Neural Networks
A common problem of classical neural network architectures is that additional information or expert knowledge cannot be naturally integrated into the learning process. To overcome this limitation, we propose a two-step approach consisting of (1) generating rule functions from knowledge and (2) using these rules to define rule based layers -- a new type of dynamic neural network layer. The focus of this work is on the second step, i.e., rule based layers that are designed to dynamically arrange learnable parameters in the weight matrices and bias vectors depending on the input samples. Indeed, we prove that our approach generalizes classical feed-forward layers such as fully connected and convolutional layers by choosing appropriate rules. As a concrete application we present rule based graph neural networks (RuleGNNs) that overcome some limitations of ordinary graph neural networks. Our experiments show that the predictive performance of RuleGNNs is comparable to state-of-the-art graph classifiers using simple rules based on Weisfeiler-Leman labeling and pattern counting. Moreover, we introduce new synthetic benchmark graph datasets to show how to integrate expert knowledge into RuleGNNs making them more powerful than ordinary graph neural networks.
Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering
Lebeau, Hugo, Chatelain, Florent, Couillet, Romain
The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.
Provable advantages of kernel-based quantum learners and quantum preprocessing based on Grover's algorithm
Muser, Till, Zapusek, Elias, Belis, Vasilis, Reiter, Florentin
There is an ongoing effort to find quantum speedups for learning problems. Recently, [Y. Liu et al., Nat. Phys. $\textbf{17}$, 1013--1017 (2021)] have proven an exponential speedup for quantum support vector machines by leveraging the speedup of Shor's algorithm. We expand upon this result and identify a speedup utilizing Grover's algorithm in the kernel of a support vector machine. To show the practicality of the kernel structure we apply it to a problem related to pattern matching, providing a practical yet provable advantage. Moreover, we show that combining quantum computation in a preprocessing step with classical methods for classification further improves classifier performance.
Resolution for Constrained Pseudo-Propositional Logic
This work, shows how propositional resolution can be generalized to obtain a resolution proof system for constrained pseudo-propositional logic (CPPL), which is an extension resulted from inserting the natural numbers with few constraints symbols into the alphabet of propositional logic and adjusting the underling language accordingly. Unlike the construction of CNF formulas which are restricted to a finite set of clauses, the extended CPPL does not require the corresponding set to be finite. Although this restriction is made dispensable, this work presents a constructive proof showing that the generalized resolution for CPPL is sound and complete. As a marginal result, this implies that propositional resolution is also sound and complete for formulas with even infinite set of clauses.
Semiring Reasoning Frameworks in AI and Their Computational Complexity
Eiter, Thomas (TU Wien) | Kiesel, Rafael (TU Wien)
Many important problems in AI, among them #SAT, parameter learning and probabilistic inference go beyond the classical satisfiability problem. Here, instead of finding a solution we are interested in a quantity associated with the set of solutions, such as the number of solutions, the optimal solution or the probability that a query holds in a solution. To model such quantitative problems in a uniform manner, a number of frameworks, e.g. Algebraic Model Counting and Semiring-based Constraint Satisfaction Problems, employ what we call the semiring paradigm. In the latter the abstract algebraic structure of the semiring serves as a means of parameterizing the problem definition, thus allowing for different modes of quantitative computations by choosing different semirings. While efficiently solvable cases have been widely studied, a systematic study of the computational complexity of such problems depending on the semiring parameter is missing. In this work, we characterize the latter by NP(R), a novel generalization of NP over semiring R, and obtain NP(R)-completeness results for a selection of semiring frameworks. To obtain more tangible insights into the hardness of NP(R), we link it to well-known complexity classes from the literature. Interestingly, we manage to connect the computational hardness to properties of the semiring. Using this insight, we see that, on the one hand, NP(R) is always at least as hard as NP or ModpP depending on the semiring R and in general unlikely to be in FPSPACEpoly. On the other hand, for broad subclasses of semirings relevant in practice we can employ reductions to NP, ModpP and #P. These results show that in many cases solutions are only mildly harder to compute than functions in NP, ModpP and #P, give us new insights into how problems that involve counting on semirings can be approached, and provide a means of assessing whether an algorithm is appropriate for a given class of problems.
Generation and Prediction of Difficult Model Counting Instances
Escamocher, Guillaume, O'Sullivan, Barry
We present a way to create small yet difficult model counting instances. Our generator is highly parameterizable: the number of variables of the instances it produces, as well as their number of clauses and the number of literals in each clause, can all be set to any value. Our instances have been tested on state of the art model counters, against other difficult model counting instances, in the Model Counting Competition. The smallest unsolved instances of the competition, both in terms of number of variables and number of clauses, were ours. We also observe a peak of difficulty when fixing the number of variables and varying the number of clauses, in both random instances and instances built by our generator. Using these results, we predict the parameter values for which the hardest to count instances will occur.