AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

Mathematicians who revealed the power of random walks win Abel prize

New ScientistMar-18-2020, 16:53:15 GMT

Mathematicians Hillel Furstenberg and Gregory Margulis have jointly won the 2020 Abel prize for their pioneering use of methods from probability and dynamics in other mathematical fields such as group theory, number theory and combinatorics. Furstenberg, at Hebrew University of Jerusalem in Israel, and Margulis, at Yale University, have never formally collaborated, but both invented similar "random walk" techniques to study various mathematical objects. They share a 7.5 million Norwegian kroner (£595,000) prize. "I received this notice with total disbelief," said …

mathematician, random walk win abel prize, university, (2 more...)

New Scientist

Country: Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.75)

Add feedback

State-of-the-Art Statistical Science to Tackle Famous Number Theory Conjectures

#artificialintelligenceMar-15-2020, 14:15:50 GMT

The methodology described here has broad applications, leading to new statistical tests, new type of ANOVA (analysis of variance), improved design of experiments, interesting fractional factorial designs, a better understanding of irrational numbers leading to cryptography, gaming and Fintech applications, and high quality random number generators (and when you really need them). It also features exact arithmetic / high performance computing and distributed algorithms to compute millions of binary digits for an infinite family of real numbers, including detection of auto- and cross-correlations (or lack of) in the digit distributions. The data processed in my experiment, consisting of raw irrational numbers (described by a new class of elementary recurrences) led to the discovery of unexpected apparent patterns in their digit distribution: in particular, the fact that a few of these numbers, contrarily to popular belief, do not have 50% of their binary digits equal to 1. It turned out that perfectly random digits simulated in large numbers, with a good enough pseudo-random generator, also exhibit the same strange behavior, pointing to the fact that pure randomness may not be as random as we imagine it is. Ironically, failure to exhibit these patterns would be an indicator that there really is a departure from pure randomness in the digits in question. In addition to new statistical / mathematical methods and discoveries and interesting applications, you will learn in my article how to avoid this type of statistical traps that lead to erroneous conclusions, when performing a large number of statistical tests, and how to not be misled by false appearances. I call them statistical hallucinations and false outliers. This article has two main sections: section 1, with deep research in number theory, and section 2, with deep research in statistics, with applications. You may skip one of the two sections depending on your interests and how much time you have. Both sections, despite state-of-the-art in their respective fields, are written in simple English. It is my wish that with this article, I can get data scientists to be interested in math, and the other way around: the topics in both cases have been chosen to be exciting and modern.

binary digit, digit, sequence, (15 more...)

#artificialintelligence

Genre:

Instructional Material (0.54)
Research Report > Experimental Study (0.48)

Industry: Banking & Finance (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.70)

Add feedback

The Elliptical Processes: a New Family of Flexible Stochastic Processes

Bånkestad, Maria, Sjölund, Jens, Taghia, Jalil, Schön, Thomas

arXiv.org Machine LearningMar-13-2020

We present the elliptical processes-a new family of stochastic processes that subsumes the Gaussian process and the Student-t process. This generalization retains computational tractability while substantially increasing the range of tail behaviors that can be modeled. We base the elliptical processes on a representation of elliptical distributions as mixtures of Gaussian distributions and derive closed-form expressions for the marginal and conditional distributions. We perform an in-depth study of a particular elliptical process, where the mixture distribution is piecewise constant, and show some of its advantages over the Gaussian process through a number of experiments on robust regression. Looking forward, we believe there are several settings, e.g. when the likelihood is not Gaussian or when accurate tail modeling is critical, where the elliptical processes could become the stochastic processes of choice.

elliptical distribution, elliptical process, gaussian process, (17 more...)

arXiv.org Machine Learning

2003.07201

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.91)

Add feedback

Option Discovery in the Absence of Rewards with Manifold Analysis

Bar, Amitay, Talmon, Ronen, Meir, Ron

arXiv.org Artificial IntelligenceMar-12-2020

Options have been shown to be an effective tool in reinforcement learning, facilitating improved exploration and learning. In this paper, we present an approach based on spectral graph theory and derive an algorithm that systematically discovers options without access to a specific reward or task assignment. As opposed to the common practice used in previous methods, our algorithm makes full use of the spectrum of the graph Laplacian. Incorporating modes associated with higher graph frequencies unravels domain subtleties, which are shown to be useful for option discovery. Using geometric and manifold-based analysis, we present a theoretical justification for the algorithm. In addition, we showcase its performance in several domains, demonstrating clear improvements compared to competing methods.

diffusion distance, diffusion option, eigenvector, (13 more...)

arXiv.org Artificial Intelligence

2003.05878

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

Wasserstein-based Graph Alignment

Maretic, Hermina Petric, Gheche, Mireille El, Minder, Matthias, Chierchia, Giovanni, Frossard, Pascal

arXiv.org Machine LearningMar-12-2020

We propose a novel method for comparing non-aligned graphs of different sizes, based on the Wasserstein distance between graph signal distributions induced by the respective graph Laplacian matrices. Specifically, we cast a new formulation for the one-to-many graph alignment problem, which aims at matching a node in the smaller graph with one or more nodes in the larger graph. By integrating optimal transport in our graph comparison framework, we generate both a structurally-meaningful graph distance, and a signal transportation plan that models the structure of graph data. The resulting alignment problem is solved with stochastic gradient descent, where we use a novel Dykstra operator to ensure that the solution is a one-to-many (soft) assignment matrix. We demonstrate the performance of our novel framework on graph alignment and graph classification, and we show that our method leads to significant improvements with respect to the state-of-the-art algorithms for each of these tasks.

algorithm, graph, matrix, (15 more...)

arXiv.org Machine Learning

2003.06048

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Methods of Adaptive Signal Processing on Graphs Using Vertex-Time Autoregressive Models

Variddhisai, Thiernithi, Mandic, Danilo

arXiv.org Machine LearningMar-10-2020

The concept of a random process has been recently extended to graph signals, whereby random graph processes are a class of multivariate stochastic processes whose coefficients are matrices with a \textit{graph-topological} structure. The system identification problem of a random graph process therefore revolves around determining its underlying topology, or mathematically, the graph shift operators (GSOs) i.e. an adjacency matrix or a Laplacian matrix. In the same work that introduced random graph processes, a \textit{batch} optimization method to solve for the GSO was also proposed for the random graph process based on a \textit{causal} vertex-time autoregressive model. To this end, the online version of this optimization problem was proposed via the framework of adaptive filtering. The modified stochastic gradient projection method was employed on the regularized least squares objective to create the filter. The recursion is divided into 3 regularized sub-problems to address issues like multi-convexity, sparsity, commutativity and bias. A discussion on convergence analysis is also included. Finally, experiments are conducted to illustrate the performance of the proposed algorithm, from traditional MSE measure to successful recovery rate regardless correct values, all of which to shed light on the potential, the limit and the possible research attempt of this work.

algorithm, algorithm 1, topology, (14 more...)

arXiv.org Machine Learning

2003.05729

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Czechia > Prague (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

High-dimensional, multiscale online changepoint detection

Chen, Yudong, Wang, Tengyao, Samworth, Richard J.

arXiv.org Machine LearningMar-7-2020

Modern technology has not only allowed the collection of data sets of unprecedented size, but has also facilitated the real-time monitoring of many types of evolving processes of interest. Wearable health devices, astronomical survey telescopes, self-driving cars and transport network load-tracking systems are just a few examples of new technologies that collect large quantities of streaming data, and that provide new challenges and opportunities for statisticians. Very often, a key feature of interest in the monitoring of a data stream is a changepoint; that is, a moment in time at which the data generating mechanism undergoes a change. Such times often represent events of interest, e.g. a change in heart function, and moreover, the accurate identification of changepoints often facilitates the decomposition of a data stream into stationary segments. Historically, it has tended to be univariate time series that have been monitored and studied, within the well-established field of statistical process control (e.g.

algorithm, log 2, response delay, (14 more...)

arXiv.org Machine Learning

2003.03668

Country:

North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Closing the convergence gap of SGD without replacement

Rajput, Shashank, Gupta, Anant, Papailiopoulos, Dimitris

arXiv.org Machine LearningMar-5-2020

Withand without-replacement sampling of the individual component functions are regarded as some of the most popular variants of SGD. During SGD with replacement sampling, the stochastic gradient is equal to g(x, ξ i) f ξi (x) and ξ i is a uniform number in {1,..., n}, i.e., a with-replacement sample from the set of gradients f 1,..., f n . In the case of without-replacement sapling, the stochastic gradient is equal to g(x, ξ i) f ξi (x) and ξ i is the i-th ordered element in a random permutation of the numbers in {1,..., n}, i.e., a without-replacement sample. In practice, SGD without replacement is much more widely used compared to its with-replacement counterpart, as it can empirically converge significantly faster [1, 2, 3]. However, in the land of theoretical guarantees, with-replacement SGD has been the focal point of convergence analyses. The reason for this is that analyzing stochastic gradients born with replacement is significantly more tractable for a simple reason: in expectation, the stochastic gradient is equal to the "true" gradient of F, i.e., E ξi f ξi (x) F (x). This makes SGD amenable to analyses very similar to that of vanilla gradient descent (GD), which has been extensively studied under a large variety of function classes and geometric assumptions, e.g., see [4]. Unfortunately, the same cannot be said for SGD without replacement, which has long resisted nonvacuous convergence guaranteess.

epoch, gradient, permutation, (16 more...)

arXiv.org Machine Learning

2002.104

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.94)

Add feedback

Probability Distributions in Data Science - KDnuggets

#artificialintelligenceFeb-28-2020, 04:08:11 GMT

Bio: Pier Paolo Ippolito is a final year MSc Artificial Intelligence student at The University of Southampton. He is an AI Enthusiast, Data Scientist and RPA Developer.

normal distribution, poisson distribution, probability, (13 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.05)

Industry: Education > Educational Setting (0.35)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.32)

Add feedback

LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

Chen, Tianyi, Sun, Yuejiao, Yin, Wotao

arXiv.org Machine LearningFeb-26-2020

This paper targets solving distributed machine learning problems such as federated learning in a communication-efficient fashion. A class of new stochastic gradient descent (SGD) approaches have been developed, which can be viewed as the stochastic generalization to the recently developed lazily aggregated gradient (LAG) method --- justifying the name LASG. LAG adaptively predicts the contribution of each round of communication and chooses only the significant ones to perform. It saves communication while also maintains the rate of convergence. However, LAG only works with deterministic gradients, and applying it to stochastic gradients yields poor performance. The key components of LASG are a set of new rules tailored for stochastic gradients that can be implemented either to save download, upload, or both. The new algorithms adaptively choose between fresh and stale stochastic gradients and have convergence rates comparable to the original SGD. LASG achieves impressive empirical performance --- it typically saves total communication by an order of magnitude.

communication, gradient, lazily aggregated stochastic gradient, (13 more...)

arXiv.org Machine Learning

2002.1136

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(11 more...)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback