AITopics

2105.13099

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningMay-25-2021

Hashing embeddings of optimal dimension, with applications to linear least squares

Cartis, Coralia, Fiala, Jan, Shao, Zhen

The aim of this paper is two-fold: firstly, to present subspace embedding properties for $s$-hashing sketching matrices, with $s\geq 1$, that are optimal in the projection dimension $m$ of the sketch, namely, $m=\mathcal{O}(d)$, where $d$ is the dimension of the subspace. A diverse set of results are presented that address the case when the input matrix has sufficiently low coherence (thus removing the $\log^2 d$ factor dependence in $m$, in the low-coherence result of Bourgain et al (2015) at the expense of a smaller coherence requirement); how this coherence changes with the number $s$ of column nonzeros (allowing a scaling of $\sqrt{s}$ of the coherence bound), or is reduced through suitable transformations (when considering hashed -- instead of subsampled -- coherence reducing transformations such as randomised Hadamard). Secondly, we apply these general hashing sketching results to the special case of Linear Least Squares (LLS), and develop Ski-LLS, a generic software package for these problems, that builds upon and improves the Blendenpik solver on dense input and the (sequential) LSRN performance on sparse problems. In addition to the hashing sketching improvements, we add suitable linear algebra tools for rank-deficient and for sparse problems that lead Ski-LLS to outperform not only sketching-based routines on randomly generated input, but also state of the art direct solver SPQR and iterative code HSL on certain subsets of the sparse Florida matrix collection; namely, on least squares problems that are significantly overdetermined, or moderately sparse, or difficult.

artificial intelligence, machine learning, matrix, (19 more...)

2105.11815

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Mathematics of Computing (0.93)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

#artificialintelligenceMay-23-2021, 17:10:41 GMT

A Short Machine Learning Explanation -- in terms of Linear Algebra, Probability and Calculus

In some cases we will need an array with more than two axes. In the general case, an array of numbers arranged on a regular grid with a variable number of axes is known as a tensor. Tensors and Multidimensional arrays are different types of object, the first is a type of function, the second is a data structure suitable for representing a tensor in a coordinate system. A scalar is just a single number, in contrast to most of the other objects studied in linear algebra, which are usually arrays of multiple numbers. In terms of tensor -- A tensor that contains only one number is called a Scalar(or scalar tensor, or 0-dimensional tensor, or 0D tensor).

short machine learning explanation, tensor, transformation, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

#artificialintelligenceMay-22-2021, 12:22:39 GMT

Gaussian processes (1/3) - From scratch

This post explores some concepts behind Gaussian processes, such as stochastic processes and the kernel function. We will build up deeper understanding of Gaussian process regression by implementing them from scratch using Python and NumPy. This post is followed by a second post demonstrating how to fit a Gaussian process kernel with TensorFlow probability . In what follows we assume familiarity with basic probability and linear algebra especially in the context of multivariate Gaussian distributions. Have a look at this post if you need a refresher on the Gaussian distribution.

gaussian distribution, gaussian process

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.73)
Information Technology > Artificial Intelligence > Machine Learning (0.73)

Avdiukhin, Dmitrii, Yaroslavtsev, Grigory

Escaping Saddle Points with Compressed SGD

arXiv.org Machine LearningMay-20-2021

Stochastic Gradient Descent (SGD) and its variants are the main workhorses of modern machine learning. Distributed implementations of SGD on a cluster of machines with a central server and a large number of workers are frequently used in practice due to the massive size of the data. In distributed SGD each machine holds a copy of the model and the computation proceeds in rounds. In every round, each worker finds a stochastic gradient based on its batch of examples, the server averages these stochastic gradients to obtain the gradient of the entire batch, makes an SGD step, and broadcasts the updated model parameters to the workers. With a large number of workers, computation parallelizes efficiently while communication becomes the main bottleneck [Chilimbi et al., 2014, Strom, 2015], since each worker needs to send its gradients to the server and receive the updated model parameters. Common solutions for this problem include: local SGD and its variants, when each machine performs multiple local steps before communication [Stich, 2018]; decentralized architectures which allow pairwise communication between the workers [McMahan et al., 2017] and gradient compression, when a compressed version of the gradient is communicated instead of the full gradient [Bernstein et al., 2018, Stich et al., 2018, Karimireddy et al., 2019]. In this work, we consider the latter approach, which we refer to as compressed SGD. Most machine learning models can be described by a d-dimensional vector of parameters x and the model quality can be estimated as a function f(x).

compressor, gradient, iteration, (15 more...)

2105.1009

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Indiana (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

#artificialintelligenceMay-17-2021, 01:31:00 GMT

Linear Algebra and Geometry 1

How to solve problems in linear algebra and geometry (illustrated with 175 solved problems) and why ... Chapter 1 Systems of linear equations.

chapter 1, linear algebra and geometry 1, linear equation

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.79)

Bakshi, Ainesh, Bhattacharyya, Chiranjib, Kannan, Ravi, Woodruff, David P., Zhou, Samson

Learning a Latent Simplex in Input-Sparsity Time

arXiv.org Machine LearningMay-17-2021

We consider the problem of learning a latent $k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can be viewed as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$). A large class of latent variable models, such as adversarial clustering, mixed membership stochastic block models, and topic models can be cast as learning a latent simplex. Bhattacharyya and Kannan (SODA, 2020) give an algorithm for learning such a latent simplex in time roughly $O(k\cdot\textrm{nnz}(A))$, where $\textrm{nnz}(A)$ is the number of non-zeros in $A$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$, which holds in many of these applications. Further, we show this assumption is necessary, as otherwise an algorithm for learning a latent simplex would imply an algorithmic breakthrough for spectral low rank approximation. At a high level, Bhattacharyya and Kannan provide an adaptive algorithm that makes $k$ matrix-vector product queries to $A$ and each query is a function of all queries preceding it. Since each matrix-vector product requires $\textrm{nnz}(A)$ time, their overall running time appears unavoidable. Instead, we obtain a low-rank approximation to $A$ in input-sparsity time and show that the column space thus obtained has small $\sin\Theta$ (angular) distance to the right top-$k$ singular space of $A$. Our algorithm then selects $k$ points in the low-rank subspace with the largest inner product with $k$ carefully chosen random vectors. By working in the low-rank subspace, we avoid reading the entire matrix in each iteration and thus circumvent the $\Theta(k\cdot\textrm{nnz}(A))$ running time.

algorithm, approximation, low-rank approximation, (15 more...)

2105.08005

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)

#artificialintelligenceMay-14-2021, 10:25:15 GMT

Automatic Sudoku (Number Place) Solver with Digit Recognition and Integer Linear Programming

Sudoku is a logic-based number placement puzzle that consists of 81 cells which are divided into 9 columns, rows and blocks. The goal of this game is to fill out each cells with numbers 1–9 so that there are no repeating numbers in each row, column and blocks. In this post, I aim to introduce a digit recognition and integer linear programming based automatic sudoku solver that uses the following: Keras (based on the MNIST database [1]) and OpenCV for digit recognition and PuLP for integer linear programming. The database is also widely used for training and testing in the field of machine learning. In this section, I explain the overview of image processing for digit recognition.

digit recognition, recognition and integer linear programming, sudoku puzzle, (7 more...)

Industry: Leisure & Entertainment > Games > Sudoku (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.91)

#artificialintelligenceMay-13-2021, 17:20:23 GMT

Hello World of Programming with Linear Algebra

Imagine this simplified code for inventory modeling. We're using floats for prices (bad), we store the data in global state, the architecture is far from even a simple web application. But the model is familiar enough to a typical software developer.) This (imaginary) application exists to track sales. A customer puts the desired products into a cart, we calculate the total price, and, later, perform the delivery.

linear algebra, quantity, total price

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.45)