symbolic computation
Analysis of kinetic Langevin Monte Carlo under the stochastic exponential Euler discretization from underdamped all the way to overdamped
Kim, Kyurae, Gruffaz, Samuel, Park, Ji Won, Durmus, Alain Oliviero
Simulating the kinetic Langevin dynamics is a popular approach for sampling from distributions, where only their unnormalized densities are available. Various discretizations of the kinetic Langevin dynamics have been considered, where the resulting algorithm is collectively referred to as the kinetic Langevin Monte Carlo (KLMC) or underdamped Langevin Monte Carlo. Specifically, the stochastic exponential Euler discretization, or exponential integrator for short, has previously been studied under strongly log-concave and log-Lipschitz smooth potentials via the synchronous Wasserstein coupling strategy. Existing analyses, however, impose restrictions on the parameters that do not explain the behavior of KLMC under various choices of parameters. In particular, all known results fail to hold in the overdamped regime, suggesting that the exponential integrator degenerates in the overdamped limit. In this work, we revisit the synchronous Wasserstein coupling analysis of KLMC with the exponential integrator. Our refined analysis results in Wasserstein contractions and bounds on the asymptotic bias that hold under weaker restrictions on the parameters, which assert that the exponential integrator is capable of stably simulating the kinetic Langevin dynamics in the overdamped regime, as long as proper time acceleration is applied.
CALT: A Library for Computer Algebra with Transformer
Kera, Hiroshi, Arakawa, Shun, Sato, Yuta
Recent advances in artificial intelligence have demonstrated the learnability of symbolic computation through end-to-end deep learning. Given a sufficient number of examples of symbolic expressions before and after the target computation, Transformer models - highly effective learners of sequence-to-sequence functions - can be trained to emulate the computation. This development opens up several intriguing challenges and new research directions, which require active contributions from the symbolic computation community. In this work, we introduce Computer Algebra with Transformer (CALT), a user-friendly Python library designed to help non-experts in deep learning train models for symbolic computation tasks.
Lessons on Datasets and Paradigms in Machine Learning for Symbolic Computation: A Case Study on CAD
del Río, Tereso, England, Matthew
Symbolic Computation algorithms and their implementation in computer algebra systems often contain choices which do not affect the correctness of the output but can significantly impact the resources required: such choices can benefit from having them made separately for each problem via a machine learning model. This study reports lessons on such use of machine learning in symbolic computation, in particular on the importance of analysing datasets prior to machine learning and on the different machine learning paradigms that may be utilised. We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition, but expect that the lessons learned are applicable to other decisions in symbolic computation. We utilise an existing dataset of examples derived from applications which was found to be imbalanced with respect to the variable ordering decision. We introduce an augmentation technique for polynomial systems problems that allows us to balance and further augment the dataset, improving the machine learning results by 28\% and 38\% on average, respectively. We then demonstrate how the existing machine learning methodology used for the problem $-$ classification $-$ might be recast into the regression paradigm. While this does not have a radical change on the performance, it does widen the scope in which the methodology can be applied to make choices.
Solving Some Geometry Problems of the N\'aboj 2023 Contest with Automated Deduction in GeoGebra Discovery
Hota, Amela, Kovács, Zoltán, Vujic, Alexander
In this article, we solve some of the geometry problems of the N\'aboj 2023 competition with the help of a computer, using examples that the software tool GeoGebra Discovery can calculate. In each case, the calculation requires symbolic computations. We analyze the difficulty of feeding the problem into the machine and set further goals to make the problems of this type of contests even more tractable in the future.
Optimisation in Neurosymbolic Learning Systems
Neurosymbolic AI aims to integrate deep learning with symbolic AI. This integration has many promises, such as decreasing the amount of data required to train a neural network, improving the explainability and interpretability of answers given by models and verifying the correctness of trained systems. We study neurosymbolic learning, where we have both data and background knowledge expressed using symbolic languages. How do we connect the symbolic and neural components to communicate this knowledge? One option is fuzzy reasoning, which studies degrees of truth. For example, being tall is not a binary concept. Instead, probabilistic reasoning studies the probability that something is true or will happen. Our first research question studies how different forms of fuzzy reasoning combine with learning. We find surprising results like a connection to the Raven paradox stating we confirm "ravens are black" when we observe a green apple. In this study, we did not use the background knowledge when we deployed our models after training. In our second research question, we studied how to use background knowledge in deployed models. We developed a new neural network layer based on fuzzy reasoning. Probabilistic reasoning is a natural fit for neural networks, which we usually train to be probabilistic. However, they are expensive to compute and do not scale well to large tasks. In our third research question, we study how to connect probabilistic reasoning with neural networks by sampling to estimate averages, while in the final research question, we study scaling probabilistic neurosymbolic learning to much larger problems than before. Our insight is to train a neural network with synthetic data to predict the result of probabilistic reasoning.
Explainable AI Insights for Symbolic Computation: A case study on selecting the variable ordering for cylindrical algebraic decomposition
Pickering, Lynn, Almajano, Tereso Del Rio, England, Matthew, Cohen, Kelly
In recent years there has been increased use of machine learning (ML) techniques within mathematics, including symbolic computation where it may be applied safely to optimise or select algorithms. This paper explores whether using explainable AI (XAI) techniques on such ML models can offer new insight for symbolic computation, inspiring new implementations within computer algebra systems that do not directly call upon AI tools. We present a case study on the use of ML to select the variable ordering for cylindrical algebraic decomposition. It has already been demonstrated that ML can make the choice well, but here we show how the SHAP tool for explainability can be used to inform new heuristics of a size and complexity similar to those human-designed heuristics currently commonly used in symbolic computation.
Data Augmentation for Mathematical Objects
del Rio, Tereso, England, Matthew
This paper discusses and evaluates ideas of data balancing and data augmentation in the context of mathematical objects: an important topic for both the symbolic computation and satisfiability checking communities, when they are making use of machine learning techniques to optimise their tools. We consider a dataset of non-linear polynomial problems and the problem of selecting a variable ordering for cylindrical algebraic decomposition to tackle these with. By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling when viewing the selection as a classification problem. We find this augmentation increases the accuracy of ML models by 63% on average. We study what part of this improvement is due to the balancing of the dataset and what is achieved thanks to further increasing the size of the dataset, concluding that both have a very significant effect. We finish the paper by reflecting on how this idea could be applied in other uses of machine learning in mathematics.
SYMBA: Symbolic Computation of Squared Amplitudes in High Energy Physics with Machine Learning
Alnuqaydan, Abdulhakim, Gleyzer, Sergei, Prosper, Harrison
The cross section is one of the most important physical quantities in high-energy physics and the most time consuming to compute. While machine learning has proven to be highly successful in numerical calculations in high-energy physics, analytical calculations using machine learning are still in their infancy. In this work, we use a sequence-to-sequence model, specifically, a transformer, to compute a key element of the cross section calculation, namely, the squared amplitude of an interaction. We show that a transformer model is able to predict correctly 97.6% and 99% of squared amplitudes of QCD and QED processes, respectively, at a speed that is up to orders of magnitude faster than current symbolic computation frameworks. We discuss the performance of the current model, its limitations and possible future directions for this work.
Why does not Matlab use the full capacity of my computer while training a neural network?
My goal is to train a neural network to classify objects form pictures of my webcam. I use transfer learning with Alexnet and I have a labeled training data set with 25,000 images. My training script works perfectly, but the progress of the iterations during training is very slow. I have the Parallel Computing Toolbox installed and the training runs on the Single GPU. But when looking at the task manager, Matlab only uses 13 % of the CPU and just 2 % of the GPU.
Symbolic Computation in Software Science: My Personal View
In this note, I develop my personal view on the scope and relevance of symbolic computation in software science. For this, I discuss the interaction and differences between symbolic computation, software science, automatic programming, mathematical knowledge management, artificial intelligence, algorithmic intelligence, numerical computation, and machine learning. In the discussion of these notions, I allow myself to refer also to papers (1982, 1985, 2001, 2003, 2013) of mine in which I expressed my views on these areas at early stages of some of these fields. It is a great joy to see that the SCSS (Symbolic Computation in Software Science) conference series, this year, experiences its 9th edition. A big Thank You to the organizers, referees, and contributors who kept the series going over the years! The series emerged from a couple of meetings of research groups in Austria, Japan, and Tunisia, including my Theorema Group at RISC, see the home pages of the SCSS series since 2006. In 2012, we decided to define "Symbolic Computation in Software Science" as the scope for our meetings and to establish them as an open conference series with this title. As always, when one puts two terms like "symbolic computation" and "software science" together, one is tempted to read the preposition in between - in our case "in" - as just a set-theoretic union. Pragmatically, this is reasonable if one does not want to embark on scrutinizing discussions. However, since I was one of the initiators of the SCSS series, let me take the opportunity to explain the intention behind SC in SS in this note. Also, this note, for me, is a kind of revision and summary of thoughts I had over the years on the subject of SCSS and related subjects.