Miltzow, Tillmann
On Classifying Continuous Constraint Satisfaction Problems
Miltzow, Tillmann, Schmiermann, Reinier F.
A continuous constraint satisfaction problem (CCSP) is a constraint satisfaction problem (CSP) with an interval domain $U \subset \mathbb{R}$. We engage in a systematic study to classify CCSPs that are complete of the Existential Theory of the Reals, i.e., ER-complete. To define this class, we first consider the problem ETR, which also stands for Existential Theory of the Reals. In an instance of this problem we are given some sentence of the form $\exists x_1, \ldots, x_n \in \mathbb{R} : \Phi(x_1, \ldots, x_n)$, where $\Phi$ is a well-formed quantifier-free formula consisting of the symbols $\{0, 1, +, \cdot, \geq, >, \wedge, \vee, \neg\}$, the goal is to check whether this sentence is true. Now the class ER is the family of all problems that admit a polynomial-time many-one reduction to ETR. It is known that NP $\subseteq$ ER $\subseteq$ PSPACE. We restrict our attention on CCSPs with addition constraints ($x + y = z$) and some other mild technical condition. Previously, it was shown that multiplication constraints ($x \cdot y = z$), squaring constraints ($x^2 = y$), or inversion constraints ($x\cdot y = 1$) are sufficient to establish ER-completeness. We extend this in the strongest possible sense for equality constraints as follows. We show that CCSPs (with addition constraints and some other mild technical condition) that have any one well-behaved curved equality constraint ($f(x,y) = 0$) are ER-complete. We further extend our results to inequality constraints. We show that any well-behaved convexly curved and any well-behaved concavely curved inequality constraint ($f(x,y) \geq 0$ and $g(x,y) \geq 0$) imply ER-completeness on the class of such CCSPs.
Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete
Bertschinger, Daniel, Hertrich, Christoph, Jungeblut, Paul, Miltzow, Tillmann, Weber, Simon
We consider the algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to fit a given set of data points. This problem is known as empirical risk minimization in the machine learning community. We show that the problem is $\exists\mathbb{R}$-complete. This complexity class can be defined as the set of algorithmic problems that are polynomial-time equivalent to finding real roots of a polynomial with integer coefficients. Furthermore, we show that arbitrary algebraic numbers are required as weights to be able to train some instances to optimality, even if all data points are rational. Our results hold even if the following restrictions are all added simultaneously. $\bullet$ There are exactly two output neurons. $\bullet$ There are exactly two input neurons. $\bullet$ The data has only 13 different labels. $\bullet$ The number of hidden neurons is a constant fraction of the number of data points. $\bullet$ The target training error is zero. $\bullet$ The ReLU activation function is used. This shows that even very simple networks are difficult to train. The result explains why typical methods for $\mathsf{NP}$-complete problems, like mixed-integer programming or SAT-solving, cannot train neural networks to global optimality, unless $\mathsf{NP}=\exists\mathbb{R}$. We strengthen a recent result by Abrahamsen, Kleist and Miltzow [NeurIPS 2021].
Training Neural Networks is ER-complete
Abrahamsen, Mikkel, Kleist, Linda, Miltzow, Tillmann
Training neural networks is a fundamental problem in machine learning. An (artificial) neural network is a brain-inspired computing system. For an example consider Figure 1. Neural networks are modelled by directed acyclic graphs where the vertices are called neurons. The source nodes are called the input neurons and the sinks are called output neurons, and all other neurons are said to be hidden. A network computes in the following way: Each input neuron s receives an input signal (a real number) which is sent through all out-going edges to the neurons that s points to. A non-input neuron v receives signals through the incoming edges, and v then processes the signals and transmits a single output signal to all neurons that v points to. The values computed by the output neurons are the result of the computation of the network.