Mathematical & Statistical Methods
Estimating the Spectral Density of Large Implicit Matrices
Adams, Ryan P., Pennington, Jeffrey, Johnson, Matthew J., Smith, Jamie, Ovadia, Yaniv, Patton, Brian, Saunderson, James
Many important problems are characterized by the eigenvalues of a large matrix. For example, the difficulty of many optimization problems, such as those arising from the fitting of large models in statistics and machine learning, can be investigated via the spectrum of the Hessian of the empirical loss function. Network data can be understood via the eigenstructure of a graph Laplacian matrix using spectral graph theory. Quantum simulations and other many-body problems are often characterized via the eigenvalues of the solution space, as are various dynamic systems. However, naive eigenvalue estimation is computationally expensive even when the matrix can be represented; in many of these situations the matrix is so large as to only be available implicitly via products with vectors. Even worse, one may only have noisy estimates of such matrix vector products. In this work, we combine several different techniques for randomized estimation and show that it is possible to construct unbiased estimators to answer a broad class of questions about the spectra of such implicit matrices, even in the presence of noise. We validate these methods on large-scale problems in which graph theory and random matrix theory provide ground truth.
Concept Drift and Anomaly Detection in Graph Streams
Zambon, Daniele, Alippi, Cesare, Livi, Lorenzo
Graph representations offer powerful and intuitive ways to describe data in a multitude of application domains. Here, we consider stochastic processes generating graphs and propose a methodology for detecting changes in stationarity of such processes. The methodology is general and considers a process generating attributed graphs with a variable number of vertices/edges, without the need to assume one-to-one correspondence between vertices at different time steps. The methodology acts by embedding every graph of the stream into a vector domain, where a conventional multivariate change detection procedure can be easily applied. We ground the soundness of our proposal by proving several theoretical results. In addition, we provide a specific implementation of the methodology and evaluate its effectiveness on several detection problems involving attributed graphs representing biological molecules and drawings. Experimental results are contrasted with respect to suitable baseline methods, demonstrating the effectiveness of our approach.
Go With the Flow, on Jupiter and Snow. Coherence From Model-Free Video Data without Trajectories
AlMomani, Abd AlRahman, Bollt, Erik M.
Viewing a data set such as the clouds of Jupiter, coherence is readily apparent to human observers, especially the Great Red Spot, but also other great storms and persistent structures. There are now many different definitions and perspectives mathematically describing coherent structures, but we will take an image processing perspective here. We describe an image processing perspective inference of coherent sets from a fluidic system directly from image data, without attempting to first model underlying flow fields, related to a concept in image processing called motion tracking. In contrast to standard spectral methods for image processing which are generally related to a symmetric affinity matrix, leading to standard spectral graph theory, we need a not symmetric affinity which arises naturally from the underlying arrow of time. We develop an anisotropic, directed diffusion operator corresponding to flow on a directed graph, from a directed affinity matrix developed with coherence in mind, and corresponding spectral graph theory from the graph Laplacian. Our methodology is not offered as more accurate than other traditional methods of finding coherent sets, but rather our approach works with alternative kinds of data sets, in the absence of vector field. Our examples will include partitioning the weather and cloud structures of Jupiter, and a local to Potsdam, N.Y. lake-effect snow event on Earth, as well as the benchmark test double-gyre system.
A Parallelizable Acceleration Framework for Packing Linear Programs
London, Palma (California Institute of Technology) | Vardi, Shai (California Institute of Technology) | Wierman, Adam (California Institute of Technology) | Yi, Hanling (The Chinese University of Hong Kong)
This paper presents an acceleration framework for packing linear programming problems where the amount of data available is limited, i.e., where the number of constraints m is small compared to the variable dimension n. The framework can be used as a black box to speed up linear programming solvers dramatically, by two orders of magnitude in our experiments. We present worst-case guarantees on the quality of the solution and the speedup provided by the algorithm, showing that the framework provides an approximately optimal solution while running the original solver on a much smaller problem. The framework can be used to accelerate exact solvers, approximate solvers, and parallel/distributed solvers. Further, it can be used for both linear programs and integer linear programs.
The Geometric Block Model
Galhotra, Sainyam (University of Massachusetts Amherst) | Mazumdar, Arya (University of Massachusetts Amherst) | Pal, Soumyabrata (University of Massachusetts Amherst) | Saha, Barna (University of Massachusetts Amherst)
To capture the inherent geometric features of many community detection problems, we propose to use a new random graph model of communities that we call a Geometric Block Model. The geometric block model generalizes the random geometric graphs in the same way that the well-studied stochastic block model generalizes the Erdรถs-Renyi random graphs. It is also a natural extension of random community models inspired by the recent theoretical and practical advancement in community detection. While being a topic of fundamental theoretical interest, our main contribution is to show that many practical community structures are better explained by the geometric block model. We also show that a simple triangle-counting algorithm to detect communities in the geometric block model is near-optimal. Indeed, even in the regime where the average degree of the graph grows only logarithmically with the number of vertices (sparse-graph), we show that this algorithm performs extremely well, both theoretically and practically. In contrast, the triangle-counting algorithm is far from being optimum for the stochastic block model. We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm.
5 Reasons to Learn Linear Algebra for Machine Learning - Machine Learning Mastery
Linear algebra is a field of mathematics that could be called the mathematics of data. It is undeniably a pillar of the field of machine learning, and many recommend it as a prerequisite subject to study prior to getting started in machine learning. This is misleading advice, as linear algebra makes more sense to a practitioner once they have a context of the applied machine learning process in which to interpret it. In this post, you will discover why machine learning practitioners should study linear algebra to improve their skills and capabilities as practitioners. Before we go through the reasons that you should learn linear algebra, let's start off by taking a small look at the reason why you should not.
Introduction to Matrices and Matrix Arithmetic for Machine Learning - Machine Learning Mastery
Matrices are a foundational element of linear algebra. Matrices are used throughout the field of machine learning in the description of algorithms and processes such as the input data variable (X) when training an algorithm. In this tutorial, you will discover matrices in linear algebra and how to manipulate them in Python. A Gentle Introduction to Matrices for Machine Learning Photo by Maximiliano Kolus, some rights reserved. Take my free 7-day email crash course now (with sample code).
Modelling Preference Data with the Wallenius Distribution
Grazian, Clara, Leisen, Fabrizio, Liseo, Brunero
The Wallenius distribution is a generalisation of the Hypergeometric distribution where weights are assigned to balls of different colours. This naturally defines a model for ranking categories which can be used for classification purposes. Since, in general, the resulting likelihood is not analytically available, we adopt an approximate Bayesian computational (ABC) approach for estimating the importance of the categories. We illustrate the performance of the estimation procedure on simulated datasets. Finally, we use the new model for analysing two datasets about movies ratings and Italian academic statisticians' journal preferences. The latter is a novel dataset collected by the authors.
Mini-Batch Stochastic ADMMs for Nonconvex Nonsmooth Optimization
In the paper, we study the mini-batch stochastic ADMMs (alternating direction method of multipliers) for the nonconvex nonsmooth optimization. We prove that, given an appropriate mini-batch size, the mini-batch stochastic ADMM without variance reduction (VR) technique is convergent and reaches the convergence rate of $O(1/T)$ to obtain a stationary point of the nonconvex optimization, where $T$ denotes the number of iterations. Moreover, we extend the mini-batch stochastic gradient method to both the nonconvex SVRG-ADMM and SAGA-ADMM in our initial paper \citep{huang2016stochastic}, and also prove that these mini-batch stochastic ADMMs reach the convergence rate of $O(1/T)$ without the condition on the mini-batch size. In particular, we provide a specific parameter selection for step size $\eta$ of stochastic gradients and penalization parameter $\rho$ of the augmented Lagrangian function. Finally, some experimental results demonstrate the effectiveness of our algorithms.
Are the Digits of Pi Truly Random? - Must Read for Math and Data Geeks
This article covers far more than the title suggests. It is written in simple English and accessible to quantitative professionals from a variety of backgrounds. Deep mathematical and data science research (including a result about the randomness of Pi, which is just a particular case) are presented here, without using arcane terminology or complicated equations. The topic discussed here, under a unified framework, is at the intersection of mathematics, probability theory, chaotic systems, stochastic processes, data and computer science. Many exotic objects are investigated, such as an unusual version of the logistic map, nested square roots, and representation of a number in a fractional or irrational base system. The article is also useful to anyone interested in learning these topics, whether they have any interest in the randomness or Pi or not, because of the numerous potential applications. I hope the style is refreshing, and I believe that you will find plenty of material rarely if ever discussed in textbooks or in the classroom.