Goto

Collaborating Authors

 Mathematical & Statistical Methods


Hypergraph Partitioning using Tensor Eigenvalue Decomposition

arXiv.org Machine Learning

Hypergraphs have gained increasing attention in the machine learning community lately due to their superiority over graphs in capturing super-dyadic interactions among entities. In this work, we propose a novel approach for the partitioning of k-uniform hypergraphs. Most of the existing methods work by reducing the hypergraph to a graph followed by applying standard graph partitioning algorithms. The reduction step restricts the algorithms to capturing only some weighted pairwise interactions and hence loses essential information about the original hypergraph. We overcome this issue by utilizing the tensor-based representation of hypergraphs, which enables us to capture actual super-dyadic interactions. We prove that the hypergraph to graph reduction is a special case of tensor contraction. We extend the notion of minimum ratio-cut and normalized-cut from graphs to hypergraphs and show the relaxed optimization problem is equivalent to tensor eigenvalue decomposition. This novel formulation also enables us to capture different ways of cutting a hyperedge, unlike the existing reduction approaches. We propose a hypergraph partitioning algorithm inspired from spectral graph theory that can accommodate this notion of hyperedge cuts. We also derive a tighter upper bound on the minimum positive eigenvalue of even-order hypergraph Laplacian tensor in terms of its conductance, which is utilized in the partitioning algorithm to approximate the normalized cut. The efficacy of the proposed method is demonstrated numerically on simple hypergraphs. We also show improvement for the min-cut solution on 2-uniform hypergraphs (graphs) over the standard spectral partitioning algorithm.


Introduction to Algorithms: Amazon.co.uk: Thomas H. Cormen, Charles E. Leiserson, Ronald L Rivest, Clifford Stein: 8601419521876: Books

#artificialintelligence

" "In light of the explosive growth in the amount of data and the diversity of computing applications, efficient algorithms are needed now more than ever. This beautifully written, thoughtfully organized book is the definitive introductory book on the design and analysis of algorithms. The first half offers an effective method to teach and study algorithms; the second half then engages more advanced readers and curious students with compelling material on both the possibilities and the challenges in this fascinating field."--Shang-Hua Teng, University of Southern California ""Introduction to Algorithms, " the'bible' of the field, is a comprehensive textbook covering the full spectrum of modern algorithms: from the fastest algorithms and data structures to polynomial-time algorithms for seemingly intractable problems, from classical algorithms in graph theory to special algorithms for string matching, computational geometry, and number theory. The revised third edition notably adds a chapter on van Emde Boas trees, one of the most useful data structures, and on multithreaded algorithms, a topic of increasing importance."--Daniel Spielman, Department of Computer Science, Yale University "As an educator and researcher in the field of algorithms for over two decades, I can unequivocally say that the Cormen book is the best textbook that I have ever seen on this subject.


Using the Chi-Squared test for feature selection with implementation

#artificialintelligence

Let's approach this problem of feature selection using Chi-Square a question and answer style. If you are a video guy, you may check out our youtube lecture on the same. Question 1: What is a feature? For any ML or DL problem, the data is arranged in rows and columns. Let's take the example of a titanic shipwreck problem. Question 2: What are the different types of features?


Quantum algorithms for spectral sums

arXiv.org Artificial Intelligence

The trace of matrix function, far from being only of theoretical interest, appears in many practical applications of linear algebra. To name a few, it has applications in machine learning, computational chemistry, biology, statistics, finance, and many others [1, 6, 13, 14, 24, 26, 31, 49, 52, 53]. While the problem of estimating some spectral quantities dates back to decades, many fast classical algorithms have been developed recently [7, 28, 29, 38, 47, 58, 61], highlighting the importance of spectral sums in many numerical problems. The spectral sum is defined as the sum of the eigenvalues of a matrix after a given function is applied to them. Oftentimes, the matrix will be symmetric positive definite (SPD), but there are cases where this assumption is relaxed. As an example, the logarithm of the determinant is perhaps the most common example of spectral sum, as the determinant is one of the most important properties associated with a matrix. However, the standard definition does not offer an efficient way of computing it. Remarkably, it is often the case that the logarithm of the determinant is the quantity that is effectively needed in the applications, which is much more amenable to estimation.


A contribution to Optimal Transport on incomparable spaces

arXiv.org Machine Learning

Optimal Transport is a theory that allows to define geometrical notions of distance between probability distributions and to find correspondences, relationships, between sets of points. Many machine learning applications are derived from this theory, at the frontier between mathematics and optimization. This thesis proposes to study the complex scenario in which the different data belong to incomparable spaces. In particular we address the following questions: how to define and apply Optimal Transport between graphs, between structured data? How can it be adapted when the data are varied and not embedded in the same metric space? This thesis proposes a set of Optimal Transport tools for these different cases. An important part is notably devoted to the study of the Gromov-Wasserstein distance whose properties allow to define interesting transport problems on incomparable spaces. More broadly, we analyze the mathematical properties of the various proposed tools, we establish algorithmic solutions to compute them and we study their applicability in numerous machine learning scenarii which cover, in particular, classification, simplification, partitioning of structured data, as well as heterogeneous domain adaptation.


Characterizations of non-normalized discrete probability distributions and their application in statistics

arXiv.org Machine Learning

From the distributional characterizations that lie at the heart of Stein's method we derive explicit formulae for the mass functions of discrete probability laws that identify those distributions. These identities are applied to develop tools for the solution of statistical problems. Our characterizations, and hence the applications built on them, do not require any knowledge about normalization constants of the probability laws. We discuss several examples where this lack of feasibility of the normalization constant is a built-in feature. To demonstrate that our statistical methods are sound, we provide comparative simulation studies for the testing of fit to the Poisson distribution and for parameter estimation of the negative binomial family when both parameters are unknown. We also consider the problem of parameter estimation for discrete exponential-polynomial models which generally are non-normalized.


Stochastic Approximation for High-frequency Observations in Data Assimilation

arXiv.org Machine Learning

With the increasing penetration of high-frequency sensors across a number of biological and physical systems, the abundance of the resulting observations offers opportunities for higher statistical accuracy of down-stream estimates, but their frequency results in a plethora of computational problems in data assimilation tasks. The high-frequency of these observations has been traditionally dealt with by using data modification strategies such as accumulation, averaging, and sampling. However, these data modification strategies will reduce the quality of the estimates, which may be untenable for many systems. Therefore, to ensure high-quality estimates, we adapt stochastic approximation methods to address the unique challenges of high-frequency observations in data assimilation. As a result, we are able to produce estimates that leverage all of the observations in a manner that avoids the aforementioned computational problems and preserves the statistical accuracy of the estimates.



Algorithms and Hardness for Linear Algebra on Geometric Graphs

arXiv.org Machine Learning

For a function $\mathsf{K} : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{R}_{\geq 0}$, and a set $P = \{ x_1, \ldots, x_n\} \subset \mathbb{R}^d$ of $n$ points, the $\mathsf{K}$ graph $G_P$ of $P$ is the complete graph on $n$ nodes where the weight between nodes $i$ and $j$ is given by $\mathsf{K}(x_i, x_j)$. In this paper, we initiate the study of when efficient spectral graph theory is possible on these graphs. We investigate whether or not it is possible to solve the following problems in $n^{1+o(1)}$ time for a $\mathsf{K}$-graph $G_P$ when $d < n^{o(1)}$: $\bullet$ Multiply a given vector by the adjacency matrix or Laplacian matrix of $G_P$ $\bullet$ Find a spectral sparsifier of $G_P$ $\bullet$ Solve a Laplacian system in $G_P$'s Laplacian matrix For each of these problems, we consider all functions of the form $\mathsf{K}(u,v) = f(\|u-v\|_2^2)$ for a function $f:\mathbb{R} \rightarrow \mathbb{R}$. We provide algorithms and comparable hardness results for many such $\mathsf{K}$, including the Gaussian kernel, Neural tangent kernels, and more. For example, in dimension $d = \Omega(\log n)$, we show that there is a parameter associated with the function $f$ for which low parameter values imply $n^{1+o(1)}$ time algorithms for all three of these problems and high parameter values imply the nonexistence of subquadratic time algorithms assuming Strong Exponential Time Hypothesis ($\mathsf{SETH}$), given natural assumptions on $f$. As part of our results, we also show that the exponential dependence on the dimension $d$ in the celebrated fast multipole method of Greengard and Rokhlin cannot be improved, assuming $\mathsf{SETH}$, for a broad class of functions $f$. To the best of our knowledge, this is the first formal limitation proven about fast multipole methods.


Machine Learning & Linear Algebra

#artificialintelligence

Linear algebra is essential in Machine Learning (ML) and Deep Learning (DL). It is not hard. You just need to bring yourself up to speed. I will skip fundamentals like what is a vector, and matrix…