exact learning
Exact Learning of Arithmetic with Differentiable Agents
Papazov, Hristo, D'Angelo, Francesco, Flammarion, Nicolas
We explore the possibility of exact algorithmic learning with gradient-based methods and introduce a differentiable framework capable of strong length generalization on arithmetic tasks. Our approach centers on Differentiable Finite-State Transducers (DFSTs), a Turing-complete model family that avoids the pitfalls of prior architectures by enabling constant-precision, constant-time generation, and end-to-end log-parallel differentiable training. Leveraging policy-trajectory observations from expert agents, we train DFSTs to perform binary and decimal addition and multiplication. Remarkably, models trained on tiny datasets generalize without error to inputs thousands of times longer than the training examples. These results show that training differentiable agents on structured intermediate supervision could pave the way towards exact gradient-based learning of algorithmic skills. Code available at \href{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}{https://github.com/dngfra/differentiable-exact-algorithmic-learner.git}.
Beyond Statistical Learning: Exact Learning Is Essential for General Intelligence
György, András, Lattimore, Tor, Lazić, Nevena, Szepesvári, Csaba
Sound deductive reasoning -- the ability to derive new knowledge from existing facts and rules -- is an indisputably desirable aspect of general intelligence. Despite the major advances of AI systems in areas such as math and science, especially since the introduction of transformer architectures, it is well-documented that even the most advanced frontier systems regularly and consistently falter on easily-solvable deductive reasoning tasks. Hence, these systems are unfit to fulfill the dream of achieving artificial general intelligence capable of sound deductive reasoning. We argue that their unsound behavior is a consequence of the statistical learning approach powering their development. To overcome this, we contend that to achieve reliable deductive reasoning in learning-based AI systems, researchers must fundamentally shift from optimizing for statistical performance against distributions on reasoning problems and algorithmic tasks to embracing the more ambitious exact learning paradigm, which demands correctness on all inputs. We argue that exact learning is both essential and possible, and that this ambitious objective should guide algorithm design.
Size of Multilayer Networks for Exact Learning: Analytic Approach
This article presents a new result about the size of a multilayer neural network computing real outputs for exact learning of a finite set of real samples. The architecture of the network is feedforward, with one hidden layer and several outputs. Starting from a fixed training set, we consider the network as a function of its weights. We derive, for a wide family of transfer functions, a lower and an upper bound on the number of hidden units for exact learning, given the size of the dataset and the dimensions of the input and output spaces.
Exact learning for infinite families of concepts
In this paper, based on results of exact learning, test theory, and rough set theory, we study arbitrary infinite families of concepts each of which consists of an infinite set of elements and an infinite set of subsets of this set called concepts. We consider the notion of a problem over a family of concepts that is described by a finite number of elements: for a given concept, we should recognize which of the elements under consideration belong to this concept. As algorithms for problem solving, we consider decision trees of five types: (i) using membership queries, (ii) using equivalence queries, (iii) using both membership and equivalence queries, (iv) using proper equivalence queries, and (v) using both membership and proper equivalence queries. As time complexity, we study the depth of decision trees. In the worst case, with the growth of the number of elements in the problem description, the minimum depth of decision trees of the first type either grows as a logarithm or linearly, and the minimum depth of decision trees of each of the other types either is bounded from above by a constant or grows as a logarithm, or linearly. The obtained results allow us to distinguish seven complexity classes of infinite families of concepts.
Exact Learning of Qualitative Constraint Networks from Membership Queries
Mouhoub, Malek, Marri, Hamad Al, Alanazi, Eisa
A Qualitative Constraint Network (QCN) is a constraint graph for representing problems under qualitative temporal and spatial relations, among others. More formally, a QCN includes a set of entities, and a list of qualitative constraints defining the possible scenarios between these entities. These latter constraints are expressed as disjunctions of binary relations capturing the (incomplete) knowledge between the involved entities. QCNs are very effective in representing a wide variety of real-world applications, including scheduling and planning, configuration and Geographic Information Systems (GIS). It is however challenging to elicit, from the user, the QCN representing a given problem. To overcome this difficulty in practice, we propose a new algorithm for learning, through membership queries, a QCN from a non expert. In this paper, membership queries are asked in order to elicit temporal or spatial relationships between pairs of temporal or spatial entities. In order to improve the time performance of our learning algorithm in practice, constraint propagation, through transitive closure, as well as ordering heuristics, are enforced. The goal here is to reduce the number of membership queries needed to reach the target QCN. In order to assess the practical effect of constraint propagation and ordering heuristics, we conducted several experiments on randomly generated temporal and spatial constraint network instances. The results of the experiments are very encouraging and promising.
Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks
Training deep neural networks for solving machine learning problems is one great challenge in the field, mainly due to its associated optimisation problem being highly non-convex. Recent developments have suggested that many training algorithms do not suffer from undesired local minima under certain scenario, and consequently led to great efforts in pursuing mathematical explanations for such observations. This work provides an alternative mathematical understanding of the challenge from a smooth optimisation perspective. By assuming exact learning of finite samples, sufficient conditions are identified via a critical point analysis to ensure any local minimum to be globally minimal as well. Furthermore, a state of the art algorithm, known as the Generalised Gauss-Newton (GGN) algorithm, is rigorously revisited as an approximate Newton's algorithm, which shares the property of being locally quadratically convergent to a global minimum under the condition of exact learning.
Exact Learning of Lightweight Description Logic Ontologies
Konev, Boris (University of Liverpool) | Lutz, Carsten (University of Bremen) | Ozaki, Ana (University of Liverpool) | Wolter, Frank (University of Liverpool)
We study learning of description logic TBoxes in Angluin et al.’s framework of exact learning via queries. We admit entailment queries (“is a given subsumption entailed by the target TBox?”) and equivalence queries (“is a given TBox equivalent to the target TBox?”), assuming that the signature and logic of the target TBox are known. We present three main results: (1) TBoxes formulated in DL-Lite with role inclusions and composite concepts on the right-hand side of concept inclusions can be learned in polynomial time; (2) EL TBoxes with only concept names on the right-hand side of concept inclusions can be learned in polynomial time; and (3) EL TBoxes cannot be learned in polynomial time. It follows that non-polynomial time learnability of EL TBoxes is caused by the interaction between existential restrictions on the right and left-hand sides of concept inclusions. We also show that neither entailment nor equivalence queries alone are sufficient in cases (1) and (2) above.
Size of Multilayer Networks for Exact Learning: Analytic Approach
Elisseeff, André, Paugam-Moisy, Hélène
This article presents a new result about the size of a multilayer neural network computing real outputs for exact learning of a finite set of real samples. The architecture of the network is feedforward, with one hidden layer and several outputs. Starting from a fixed training set, we consider the network as a function of its weights. We derive, for a wide family of transfer functions, a lower and an upper bound on the number of hidden units for exact learning, given the size of the dataset and the dimensions of the input and output spaces. The context of our work is rather similar to the well-known results of Baum et al. [1, 2,3,5, 10], but we consider both real inputs and outputs, instead ofthe dichotomies usually addressed.
Size of Multilayer Networks for Exact Learning: Analytic Approach
Elisseeff, André, Paugam-Moisy, Hélène
This article presents a new result about the size of a multilayer neural network computing real outputs for exact learning of a finite set of real samples. The architecture of the network is feedforward, with one hidden layer and several outputs. Starting from a fixed training set, we consider the network as a function of its weights. We derive, for a wide family of transfer functions, a lower and an upper bound on the number of hidden units for exact learning, given the size of the dataset and the dimensions of the input and output spaces. The context of our work is rather similar to the well-known results of Baum et al. [1, 2,3,5, 10], but we consider both real inputs and outputs, instead ofthe dichotomies usually addressed.
Size of Multilayer Networks for Exact Learning: Analytic Approach
Elisseeff, André, Paugam-Moisy, Hélène
The architecture of the network is feedforward, with one hidden layer and several outputs. Starting from a fixed training set, we consider the network as a function of its weights. We derive, for a wide family of transfer functions, a lower and an upper bound on the number of hidden units for exact learning, given the size of the dataset and the dimensions of the input and output spaces. 1 RELATED WORKS The context of our work is rather similar to the well-known results of Baum et al. [1, 2,3,5, 10], but we consider both real inputs and outputs, instead ofthe dichotomies usually addressed. We are interested in learning exactly all the examples of a fixed database, hence our work is different from stating that multilayer networks are universal approximators [6, 8, 9]. Since we consider real outputs and not only dichotomies, it is not straightforward to compare our results to the recent works about the VC-dimension of multilayer networks [11, 12, 13]. Our study is more closely related to several works of Sontag [14, 15], but with different hypotheses on the transfer functions of the units. Finally, our approach is based on geometrical considerations and is close to the model of Coetzee and Stonick [4]. First we define the model of network and the notations and second we develop our analytic approach and prove the fundamental theorem. In the last section, we discuss our point of view and propose some practical consequences of the result.