Goto

Collaborating Authors

 mathematical notation


MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training

arXiv.org Artificial Intelligence

Mathematical formulas are a fundamental and widely used component in various scientific fields, serving as a universal language for expressing complex concepts and relationships. While state-of-the-art transformer models excel in processing and understanding natural language, they encounter challenges with mathematical notation, which involves a complex structure and diverse representations. This study focuses on the development of specialized training datasets to enhance the encoding of mathematical content. We introduce Math Mutator (MAMUT), a framework capable of generating equivalent and falsified versions of a given mathematical formula in LaTeX notation, effectively capturing the mathematical variety in notation of the same concept. Based on MAMUT, we have generated four large mathematical datasets containing diverse notation, which can be used to train language models with enhanced mathematical embeddings.


Greek2MathTex: A Greek Speech-to-Text Framework for LaTeX Equations Generation

arXiv.org Artificial Intelligence

In the vast majority of the academic and scientific domains, LaTeX has established itself as the de facto standard for typesetting complex mathematical equations and formulae. However, LaTeX's complex syntax and code-like appearance present accessibility barriers for individuals with disabilities, as well as those unfamiliar with coding conventions. In this paper, we present a novel solution to this challenge through the development of a novel speech-to-LaTeX equations system specifically designed for the Greek language. We propose an end-to-end system that harnesses the power of Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) techniques to enable users to verbally dictate mathematical expressions and equations in natural language, which are subsequently converted into LaTeX format. We present the architecture and design principles of our system, highlighting key components such as the ASR engine, the LLM-based prompt-driven equations generation mechanism, as well as the application of a custom evaluation metric employed throughout the development process. We have made our system open source and available at https://github.com/magcil/greek-speech-to-math.


SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration

arXiv.org Artificial Intelligence

This article introduces a new symbolic regression algorithm based on the SPINEX (Similarity-based Predictions with Explainable Neighbors Exploration) family. This new algorithm (SPINEX_SymbolicRegression) adopts a similarity-based approach to identifying high-merit expressions that satisfy accuracy- and structural similarity metrics. We conducted extensive benchmarking tests comparing SPINEX_SymbolicRegression to over 180 mathematical benchmarking functions from international problem sets that span randomly generated expressions and those based on real physical phenomena. Then, we evaluated the performance of the proposed algorithm in terms of accuracy, expression similarity in terms of presence operators and variables (as compared to the actual expressions), population size, and number of generations at convergence. The results indicate that SPINEX_SymbolicRegression consistently performs well and can, in some instances, outperform leading algorithms. In addition, the algorithm's explainability capabilities are highlighted through in-depth experiments.


Mirror Matrix on the Wall: coding and vector notation as tools for introspection

arXiv.org Artificial Intelligence

The vector notation adopted by GNU Octave plays a significant role as a tool for introspection, aligning itself with the vision of Kenneth E. Iverson. He believed that, just like mathematics, a programming language should be an effective thinking tool for representing and reasoning about problems we wish to address. This work aims to explore the use of vector notation in GNU Octave through the analysis of operators and functions, providing a closer alignment with mathematical notation and enhancing code efficiency. We will delve into fundamental concepts such as indexing, broadcasting, and function handles, and present case studies for a deeper understanding of these concepts. By adopting vector notation, GNU Octave becomes a powerful tool for mathematicians, scientists and engineers, enabling them to express and solve complex problems more effectively and intuitively.


English to Arabic machine translation of mathematical documents

arXiv.org Artificial Intelligence

This paper is about the development of a machine translation system tailored specifically for LATEX mathematical documents. The system focuses on translating English LATEX mathematical documents into Arabic LATEX, catering to the growing demand for multilingual accessibility in scientific and mathematical literature. With the vast proliferation of LATEX mathematical documents the need for an efficient and accurate translation system has become increasingly essential. This paper addresses the necessity for a robust translation tool that enables seamless communication and comprehension of complex mathematical content across language barriers. The proposed system leverages a Transformer model as the core of the translation system, ensuring enhanced accuracy and fluency in the translated Arabic LATEX documents. Furthermore, the integration of RyDArab, an Arabic mathematical TEX extension, along with a rule-based translator for Arabic mathematical expressions, contributes to the precise rendering of complex mathematical symbols and equations in the translated output. The paper discusses the architecture, methodology, of the developed system, highlighting its efficacy in bridging the language gap in the domain of mathematical documentation


Probability for machine learning

#artificialintelligence

In this post, we will walk through the building blocks of probability theory and use these learnings to motivate fundamental ideas in machine learning. In the first section, we will talk about random variables and how they help quantify real world experiments. The final section will talk about how these mathematical concepts are used together to solve machine learning problems. Let's begin our journey with a fun experiment. Take a pen and paper; go outside to the main street in front of your house. Look at every person that walks passed you and take note their hair color; some approximation of their height in centimeters; and any other detail you find interesting. Do this for about 10 minutes. You conducted your first experiment! With this experiment, you can now answer some questions: How many people walked passed you?


Learning Resources for Machine Learning - Programmathically

#artificialintelligence

Familiarity with basic statistics and mathematical notation is helpful. An Introduction to Statistical Learning is one of the best introductory textbooks on classical machine learning techniques such as linear regression. It was the first machine learning book I've bought and has given me a great foundation. The explanations are held on a high level, so you don't need advanced math skills. Every chapter comes with code examples and labs in R. It is a great book to work through cover-to-cover. Get "An Introduction to Statistical Learning" on Amazon


WHY JULIA IS CREATED?

#artificialintelligence

We want the speed of C with the dynamism of Ruby. We want a language that's homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled. When Julia was conceived in 2009 at MIT the goal was to solve a problem that still exists: the need to use two (or more) languages, one for high performance (C or C) and another that made programming complex systems a more pleasant experience (the Python example).


A checklist to track your Machine Learning progress

#artificialintelligence

Have you ever asked yourself where you currently are on your Machine Learning journey? And what’s there that you can still learn about? This checklist helps you answer such questions. It provides an…


Mathematical Notation for Recommender Systems

@machinelearnbot

Over the years of teaching and research, I have gradually standardized the notation that I use for describing the math of recommender systems. This is the notation that I use in my classes, Joe Konstan and I have adopted for our MOOC, and that I use in most of my research papers. If you haven't already settled on a notation, perhaps you would consider adopting this one. I have tried to strike a balance between clarity and clutter. I slightly overload the meaning of some symbols; in particular, I am loose with distinctions between sets and matrices, because it is generally clear from context which is being invoked; I do not overload external referents, however.