compute


Stanford University finds that AI is outpacing Moore's Law

#artificialintelligence

Stanford University's AI Index 2019 annual report has found that the speed of artificial intelligence (AI) is outpacing Moore's Law. Moore's Law maps out how processor speeds double every 18 months to two years, which means application developers can expect a doubling in application performance for the same hardware cost. But the Stanford report, produced in partnership with McKinsey & Company, Google, PwC, OpenAI, Genpact and AI21Labs, found that AI computational power is accelerating faster than traditional processor development. "Prior to 2012, AI results closely tracked Moore's Law, with compute doubling every two years.," "Post-2012, compute has been doubling every 3.4 months."


Huawei, Intel, Bosch et al take on open source edge computing - Data Economy - UrIoTNews

#artificialintelligence

Last week, Amazon's AWS re:Invent 2019 conference welcomed more than 60,000 attendees, spread out across six venues on the Las Vegas Strip, which promised to make re:Invent 2019 the biggest re:Invent yet. AWS announced the opening of an AWS Local Zone in Los Angeles (LA). AWS Local Zones are a new type of AWS infrastructure deployment that place compute, storage, database, and other select services close to customers, giving developers in LA the ability to deploy applications that require single-digit millisecond latencies to end-users in LA. The cloud giants unveiled nine new Amazon Elastic Compute Cloud (EC2) innovations. AWS added to its industry-leading compute and networking innovations with new Arm-based instances (M6g, C6g, R6g) powered by AWS-designed processors in Graviton2, machine learning inference instances (Inf1) powered by AWS-designed Inferentia chips.


Core Scientific Brings High-End AI Compute to Equinix Customers

#artificialintelligence

Built on verified architectures that combine NVIDIA DGX systems for compute and flash storage technology, the Cloud for Data Scientists provides accelerated GPU compute tuned to the demands of deep learning. Core Scientific is a leader in artificial intelligence and blockchain technologies, delivering best-in-class infrastructure and software solutions. In an increasingly distributed and connected world, Core Scientific believes that AI and blockchain are changing the way information is processed, shared and stored across a range of industries. Announced earlier this year, the Core Scientific AiLab is an artificial intelligence and machine learning cloud built from the ground up for the needs of data scientists. "Providing leading edge AI infrastructure and AI applications and tools to Equinix customers is an integral part of our strategy," said Kevin Turner, CEO of Core Scientific.


Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm

Neural Information Processing Systems

This paper considers the problem of estimating the distribution of returns in reinforcement learning (i.e., distributional RL problem). It presents a new representational framework to maintain the uncertainty of returns and provides mathematical tools to compute it. We show that instead of representing a probability distribution function of returns, one can represent their characteristic function instead, the Fourier transform of their distribution. We call the new representation Characteristic Value Function (CVF), which can be interpreted as the frequency domain representation of the probability distribution of returns. We show that the CVF satisfies a Bellman-like equation, and its corresponding Bellman operator is contraction with respect to certain metrics.


Sliced Gromov-Wasserstein

Neural Information Processing Systems

Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties ({\em e.g.} duality) that permit large scale optimization. Among those, the solution of W on the real line, that only requires sorting discrete samples in 1D, allows defining the Sliced Wasserstein (SW) distance. This paper proposes a new divergence based on GW akin to SW.


Computing Linear Restrictions of Neural Networks

Neural Information Processing Systems

A linear restriction of a function is the same function with its domain restricted to points on a given line. This paper addresses the problem of computing a succinct representation for a linear restriction of a piecewise-linear neural network. This primitive, which we call ExactLine, allows us to exactly characterize the result of applying the network to all of the infinitely many points on a line. In particular, ExactLine computes a partitioning of the given input line segment such that the network is affine on each partition. We present an efficient algorithm for computing ExactLine for networks that use ReLU, MaxPool, batch normalization, fully-connected, convolutional, and other layers, along with several applications.


Distributed estimation of the inverse Hessian by determinantal averaging

Neural Information Processing Systems

In distributed optimization and distributed numerical linear algebra, we often encounter an inversion bias: if we want to compute a quantity that depends on the inverse of a sum of distributed matrices, then the sum of the inverses does not equal the inverse of the sum. An example of this occurs in distributed Newton's method, where we wish to compute (or implicitly work with) the inverse Hessian multiplied by the gradient. In this case, locally computed estimates are biased, and so taking a uniform average will not recover the correct solution. To address this, we propose determinantal averaging, a new approach for correcting the inversion bias. This approach involves reweighting the local estimates of the Newton's step proportionally to the determinant of the local Hessian estimate, and then averaging them together to obtain an improved global estimate.


Compete to Compute

Neural Information Processing Systems

Local competition among neighboring neurons is common in biological neural networks (NNs). We apply the concept to gradient-based, backprop-trained artificial multilayer NNs. NNs with competing linear units tend to outperform those with non-competing nonlinear units, and avoid catastrophic forgetting when training sets change over time. Papers published at the Neural Information Processing Systems Conference.


PyTorch for Deep Learning: A Quick Guide for Starters

#artificialintelligence

Since there are multiple optimization schemes to choose from, we just need to choose one for our problem and rest the underlying PyTorch library does the magic for us.


Fast and Accurate Least-Mean-Squares Solvers

Neural Information Processing Systems

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. We suggest an algorithm that gets a finite set of $n$ $d$-dimensional real vectors and returns a weighted subset of $d 1$ vectors whose sum is \emph{exactly} the same. The proof in Caratheodory's Theorem (1907) computes such a subset in $O(n 2d 2)$ time and thus not used in practice. Our algorithm computes this subset in $O(nd)$ time, using $O(\log n)$ calls to Caratheodory's construction on small but "smart" subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets.