Goto

Collaborating Authors

 Mathematical & Statistical Methods


On the Estimation of Network Complexity: Dimension of Graphons

arXiv.org Machine Learning

Network complexity has been studied for over half a century and has found a wide range of applications. Many methods have been developed to characterize and estimate the complexity of networks. However, there has been little research with statistical guarantees. In this paper, we develop a statistical theory of graph complexity in a general model of random graphs, the so-called graphon model. Given a graphon, we endow the latent space of the nodes with the so-called neighborhood distance that measures the propensity of two nodes to be connected with similar nodes. Our complexity index is then based on the covering number and the Minkowski dimension of (a purified version of) this metric space. Although the latent space is not identifiable, these indices turn out to be identifiable. This notion of complexity has simple interpretations on popular examples of random graphs: it matches the number of communities in stochastic block models; the dimension of the Euclidean space in random geometric graphs; the regularity of the link function in H\"older graphon models. From a single observation of the graph, we construct an estimator of the neighborhood-distance and show universal non-asymptotic bounds for its risk, matching minimax lower bounds. Based on this estimated distance, we compute the corresponding covering number and Minkowski dimension and we provide optimal non-asymptotic error bounds for these two plug-in estimators.


Linear Convergence of Adaptive Stochastic Gradient Descent

arXiv.org Machine Learning

We prove that the norm version of the adaptive stochastic gradient method (AdaGrad-Norm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that satisfy the Polyak-Lojasiewicz (PL) inequality. The paper introduces the notion of Restricted Uniform Inequality of Gradients (RUIG), which describes the uniform lower bound for the norm of the stochastic gradients with respect to the distance to the optimal solution. RUIG plays the key role in proving the robustness of AdaGrad-Norm to its hyper-parameter tuning. On top of RUIG, we develop a novel two-stage framework to prove linear convergence of AdaGrad-Norm without knowing the parameters of the objective functions: Stage I: the step-size decrease fast such that it reaches to Stage II; Stage II: the step-size decreases slowly and converges. This framework can likely be extended to other adaptive stepsize algorithms. The numerical experiments show desirable agreement with our theories.


Reasoning-Driven Question-Answering for Natural Language Understanding

arXiv.org Artificial Intelligence

Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts: In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions. In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems. In the final part, we present the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field.


On Convergence of Distributed Approximate Newton Methods: Globalization, Sharper Bounds and Beyond

arXiv.org Machine Learning

The DANE algorithm is an approximate Newton method popularly used for communication-efficient distributed machine learning. Reasons for the interest in DANE include scalability and versatility. Convergence of DANE, however, can be tricky; its appealing convergence rate is only rigorous for quadratic objective, and for more general convex functions the known results are no stronger than those of the classic first-order methods. To remedy these drawbacks, we propose in this paper some new alternatives of DANE which are more suitable for analysis. We first introduce a simple variant of DANE equipped with backtracking line search, for which global asymptotic convergence and sharper local non-asymptotic convergence rate guarantees can be proved for both quadratic and non-quadratic strongly convex functions. Then we propose a heavy-ball method to accelerate the convergence of DANE, showing that nearly tight local rate of convergence can be established for strongly convex functions, and with proper modification of algorithm the same result applies globally to linear prediction models. Numerical evidence is provided to confirm the theoretical and practical advantages of our methods.


Some Developments in Clustering Analysis on Stochastic Processes

arXiv.org Machine Learning

Some Developments in Clustering Analysis on Stochastic Processes Qidi Peng Nan Rao † Ran Zhao ‡ Abstract We review some developments on clustering stochastic processes and come with the conclusion that asymptotically consistent clustering algorithms can be obtained when the processes are ergodic and the dissimilarity measure satisfies the triangle inequality. Examples are provided when the processes are distribution ergodic, covariance ergodic and locally asymptotically self-similar, respectively. Keywords: stochastic process, unsupervised clustering, stationary ergodic processes, local asymptotic self-similarity 1 Introduction A stochastic process is an infinite sequence of random variables indexed by "time". The time indexes can be either discrete or continuous. Stochastic process type data have been broadly explored in biological and medical research (Damian et al., 2007; Zhao et al., 2014; J a askinen et al., 2014; et al., 2018).


Can You Learn Machine Learning Without Linear Algebra?

#artificialintelligence

Machine learning is a field that has emerged out of numerous innovations in computational sciences, spanning centuries. So, can a machine learning enthusiast skip linear algebra and flourish? The short answer is -- NO. Linear Algebra is a branch of mathematics that is widely used throughout science and engineering. Good understanding of linear algebra is essential for understanding and working with many ML algorithms, especially deep learning algorithms.


A Matrix--free Likelihood Method for Exploratory Factor Analysis of High-dimensional Gaussian Data

arXiv.org Machine Learning

This paper proposes a novel profile likelihood method for estimating the covariance parameters in exploratory factor analysis of high-dimensional Gaussian datasets with fewer observations than number of variables. An implicitly restarted Lanczos algorithm and a limited-memory quasi-Newton method are implemented to develop a matrix-free framework for likelihood maximization. Simulation results show that our method is substantially faster than the expectation-maximization solution without sacrificing accuracy. Our method is applied to fit factor models on data from suicide attempters, suicide ideators and a control group.


The Ramanujan Machine: Automatically Generated Conjectures on Fundamental Constants

arXiv.org Artificial Intelligence

Fundamental mathematical constants like $e$ and $\pi$ are ubiquitous in diverse fields of science, from abstract mathematics and geometry to physics, biology and chemistry. Nevertheless, for centuries new mathematical formulas relating fundamental constants have been scarce and usually discovered sporadically. In this paper we propose a novel and systematic approach that leverages algorithms for deriving mathematical formulas for fundamental constants and help reveal their underlying structure. Our algorithms find dozens of well-known as well as previously unknown continued fraction representations of $\pi$, $e$, and the Riemann zeta function values. Two conjectures produced by our algorithm, along with many others, are: \begin{equation*} \frac{e}{e-2} = 4 - \frac{1}{5-\frac{2}{6-\frac{3}{7-\frac{4}{8-\ldots}}}} \quad\quad,\quad\quad \frac{4}{3\pi-8} = 3-\frac{1\cdot1}{6-\frac{2\cdot3}{9-\frac{3\cdot5}{12-\frac{4\cdot 7}{15-\ldots}}}} \end{equation*} We present two algorithms that proved useful in finding conjectures: a variant of the Meet-In-The-Middle (MITM) algorithm and a Gradient Descent (GD) tailored to the recurrent structure of continued fractions. Both algorithms are based on matching numerical values and thus they conjecture formulas without providing proofs and without requiring any prior knowledge on any underlaying mathematical structure. This approach is especially attractive for fundamental constants for which no mathematical structure is known, as it reverses the conventional approach of sequential logic in formal proofs. Instead, our work presents a new conceptual approach for research: computer algorithms utilizing numerical data to unveil mathematical structures, thus trying to play the role of intuition of great mathematicians of the past, providing leads to new mathematical research.


Mathematics for Artificial Intelligence – Linear Algebra

#artificialintelligence

Machine Learning, Neural Networks and Artificial intelligence are big buzzwords of the decade. It is not surprising that today these fields are expanding pretty quickly and are used to solve a vast amount of problems. We are witnesses of the new golden period of these technologies. However, today we are merely innovating. Majority of the concepts used in these fields were invented 50 or more years ago.


Can Machine Learning Identify Governing Laws For Dynamics in Complex Engineered Systems ? : A Study in Chemical Engineering

arXiv.org Machine Learning

Machine learning recently has been used to identify the governing equations for dynamics in physical systems. The promising results from applications on systems such as fluid dynamics and chemical kinetics inspire further investigation of these methods on complex engineered systems. Dynamics of these systems play a crucial role in design and operations. Hence, it would be advantageous to learn about the mechanisms that may be driving the complex dynamics of systems. In this work, our research question was aimed at addressing this open question about applicability and usefulness of novel machine learning approach in identifying the governing dynamical equations for engineered systems. We focused on distillation column which is an ubiquitous unit operation in chemical engineering and demonstrates complex dynamics i.e. it's dynamics is a combination of heuristics and fundamental physical laws. We tested the method of Sparse Identification of Non-Linear Dynamics (SINDy) because of it's ability to produce white-box models with terms that can be used for physical interpretation of dynamics. Time series data for dynamics was generated from simulation of distillation column using ASPEN Dynamics. One promising result was reduction of number of equations for dynamic simulation from 1000s in ASPEN to only 13 - one for each state variable. Prediction accuracy was high on the test data from system within the perturbation range, however outside perturbation range equations did not perform well. In terms of physical law extraction, some terms were interpretable as related to Fick's law of diffusion (with concentration terms) and Henry's law (with ratio of concentration and pressure terms). While some terms were interpretable, we conclude that more research is needed on combining engineering systems with machine learning approach to improve understanding of governing laws for unknown dynamics.