Goto

Collaborating Authors

 Mathematical & Statistical Methods


Mathematics for Machine Learning: Linear Algebra

#artificialintelligence

For a lot of higher level courses in Machine Learning and Data Science, you find you need to freshen up on the basics in mathematics - stuff you may have studied before in school or university, but which was taught in another context, or not very intuitively, such that you struggle to relate it to how it's used in Computer Science. This specialization aims to bridge that gap, getting you up to speed in the underlying mathematics, building an intuitive understanding, and relating it to Machine Learning and Data Science. In the first course on Linear Algebra we look at what linear algebra is and how it relates to data. Then we look through what vectors and matrices are and how to work with them. The second course, Multivariate Calculus, builds on this to look at how to optimize fitting functions to get good fits to data.


Towards an Understanding of Long-Tailed Runtimes of SLS Algorithms

arXiv.org Artificial Intelligence

The satisfiability problem is one of the most famous problems in computer science. Its NP-completeness has been used to argue that SAT is intractable. However, there have been tremendous advances that allow SAT solvers to solve instances with millions of variables. A particularly successful paradigm is stochastic local search. In most cases, there are different ways of formulating the underlying problem. While it is known that this has an impact on the runtime of solvers, finding a helpful formulation is generally non-trivial. The recently introduced GapSAT solver [Lorenz and W\"orz 2020] demonstrated a successful way to improve the performance of an SLS solver on average by learning additional information which logically entails from the original problem. Still, there were cases in which the performance slightly deteriorated. This justifies in-depth investigations into how learning logical implications affects runtimes for SLS. In this work, we propose a method for generating logically equivalent problem formulations, generalizing the ideas of GapSAT. This allows a rigorous mathematical study of the effect on the runtime of SLS solvers. If the modification process is treated as random, Johnson SB distributions provide a perfect characterization of the hardness. Since the observed Johnson SB distributions approach lognormal distributions, our analysis also suggests that the hardness is long-tailed. As a second contribution, we theoretically prove that restarts are useful for long-tailed distributions. This implies that additional restarts can further refine all algorithms employing above mentioned modification technique. Since the empirical studies compellingly suggest that the runtime distributions follow Johnson SB distributions, we investigate this property theoretically. We succeed in proving that the runtimes for Sch\"oning's random walk algorithm are approximately Johnson SB.


Course: Intuitive Machine Learning - Machine Learning Techniques

#artificialintelligence

Experience with manipulating some datasets, even if in Excel only, will help. The course is suited to busy professionals and students who want to learn quickly and get to the important points without wasting time on long, boring videos. Also ideal for self-learners who need a solid "jump-start" for career acceleration, and interested in quickly working on real-life problems. Be able to complete machine learning projects from beginning to end, just like a professional working in the industry, for projects ranging from NLP, clustering, regression to computer vision. Learn how to learn and become independent to solve any future problems.


eBook: Intuitive Machine Learning and Explainable AI - Machine Learning Techniques

#artificialintelligence

By Vincent Granville Ph.D. Published in September 2022. This book covers the foundations of machine learning, with modern approaches to solving complex problems. Emphasis is on scalability, automation, testing, optimizing, and interpretability (explainable AI). For instance, regression techniques -- including logistic and Lasso -- are presented as a single method, without using advanced linear algebra. There is no need to learn 50 versions when one does it all and more.


A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

arXiv.org Artificial Intelligence

The Stochastic Gradient Langevin Dynamics (SGLD) [49], first proposed by Welling and Teh, has drawn great attention of researchers when dealing with optimization or sampling tasks[2, 33, 40]. As a samplingalgorithm, SGLD canbe viewed asa"randombatch"version of the Unadjusted Langevin Algorithm (ULA), which is the Euler-Maruyama discretization of the Langevin diffusion, a stochastic process converging to a target Gibbs' distribution under suitable settings. As an optimization algorithm, SGLD can be viewed as a variant of the classical Stochastic Gradient Descent (SGD) [44], by adding independent Gaussian noise in each iteration of SGD. At recent decades, SGD and its variants [44, 25, 11, 37] have received a great deal of attention when solving high-dimensional tasks, ranging from computer vision, natural language processing, to high dimensional sampling, statistical optimization, etc. Also much theoretical analysis for SGD has been done by former researchers, including loss landscape of SGD iteration [46, 47], its dynamical stability [50] and diffusion approximation [32, 21, 17]. The combination of the SGD algorithm and the Langevin diffusion, can improve the behavior of both methods: for SGD, by taking another independent diffusion term into consideration, though not converging to a fixed point, the algorithm may be able to admit better ergodic properties and obtain better performance near saddle points [26, 52]. Besides, the application of the methodology of random mini-batch to Langevin diffusion could result in some efficient methods that could reduce computational cost while preserving the dynamical and statistical properties. Examples include the SGLD algorithm we study in the paper and the random batch methods for interacting particle systems [22, 23].


A note on diffusion limits for stochastic gradient descent

arXiv.org Artificial Intelligence

In the machine learning literature stochastic gradient descent has recently been widely discussed for its purported implicit regularization properties. Much of the theory, that attempts to clarify the role of noise in stochastic gradient algorithms, has widely approximated stochastic gradient descent by a stochastic differential equation with Gaussian noise. We provide a novel rigorous theoretical justification for this practice that showcases how the Gaussianity of the noise arises naturally.


Dealing with the Routing Problem part1(Computer Science)

#artificialintelligence

Abstract: This paper attempts to solve the famous Vehicle Routing Problem by considering multiple constraints including capacitated vehicles, single depot, and distance using two approaches namely, cluster first and route the second algorithm and using integer linear programming. A set of nodes are provided as input to the system and a feasible route is generated as output, giving clusters of nodes and the route to be traveled within the cluster. For clustering the nodes, we have adopted the DBSCAN algorithm, and the routing is done using the approximation algorithm, Christofide's algorithm. Abstract: Recently, the applications of the methodologies of Reinforcement Learning (RL) to NP-Hard Combinatorial optimization problems have become a popular topic. This is essentially due to the nature of the traditional combinatorial algorithms, often based on a trial-and-error process.


k-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension

arXiv.org Machine Learning

Sliced mutual information (SMI) is defined as an average of mutual information (MI) terms between one-dimensional random projections of the random variables. It serves as a surrogate measure of dependence to classic MI that preserves many of its properties but is more scalable to high dimensions. However, a quantitative characterization of how SMI itself and estimation rates thereof depend on the ambient dimension, which is crucial to the understanding of scalability, remain obscure. This work provides a multifaceted account of the dependence of SMI on dimension, under a broader framework termed $k$-SMI, which considers projections to $k$-dimensional subspaces. Using a new result on the continuity of differential entropy in the 2-Wasserstein metric, we derive sharp bounds on the error of Monte Carlo (MC)-based estimates of $k$-SMI, with explicit dependence on $k$ and the ambient dimension, revealing their interplay with the number of samples. We then combine the MC integrator with the neural estimation framework to provide an end-to-end $k$-SMI estimator, for which optimal convergence rates are established. We also explore asymptotics of the population $k$-SMI as dimension grows, providing Gaussian approximation results with a residual that decays under appropriate moment bounds. All our results trivially apply to SMI by setting $k=1$. Our theory is validated with numerical experiments and is applied to sliced InfoGAN, which altogether provide a comprehensive quantitative account of the scalability question of $k$-SMI, including SMI as a special case when $k=1$.


Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty

arXiv.org Artificial Intelligence

Nonlinear dynamics are ubiquitous in science and engineering applications, but the physics of most complex systems is far from being fully understood. Discovering interpretable governing equations from measurement data can help us understand and predict the behavior of complex dynamic systems. Although extensive work has recently been done in this field, robustly distilling explicit model forms from very sparse data with considerable noise remains intractable. Moreover, quantifying and propagating the uncertainty of the identified system from noisy data is challenging, and relevant literature is still limited. To bridge this gap, we develop a novel Bayesian spline learning framework to identify parsimonious governing equations of nonlinear (spatio)temporal dynamics from sparse, noisy data with quantified uncertainty. The proposed method utilizes spline basis to handle the data scarcity and measurement noise, upon which a group of derivatives can be accurately computed to form a library of candidate model terms. The equation residuals are used to inform the spline learning in a Bayesian manner, where approximate Bayesian uncertainty calibration techniques are employed to approximate posterior distributions of the trainable parameters. To promote the sparsity, an iterative sequential-threshold Bayesian learning approach is developed, using the alternative direction optimization strategy to systematically approximate L0 sparsity constraints. The proposed algorithm is evaluated on multiple nonlinear dynamical systems governed by canonical ordinary and partial differential equations, and the merit/superiority of the proposed method is demonstrated by comparison with state-of-the-art methods.


5 Free Courses to Master Linear Algebra - KDnuggets

#artificialintelligence

Data Science is the buzzword, and a lot of enthusiasts are interested in learning its fundamentals to make a lucrative career in this field. Linear Algebra is one of the important concepts to learn how to perform data transformation techniques like pre-processing, dimensionality reduction, etc. There are many courses available at your fingertip, but it is difficult to choose the right course suited for your requirement. That's precisely the intent of this post - it makes your course search easy by listing down the five free courses to learn linear algebra foundations for data science. Before I go straight into listing down the courses for you, let me first explain the commonly asked questions – why do we need to learn linear algebra in the first place?