Mathematical & Statistical Methods
Two-sample Hypothesis Testing for Inhomogeneous Random Graphs
Ghoshdastidar, Debarghya, Gutzeit, Maurilio, Carpentier, Alexandra, von Luxburg, Ulrike
The study of networks leads to a wide range of high dimensional inference problems. In most practical scenarios, one needs to draw inference from a small population of large networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime. We consider the problem of testing between two populations of inhomogeneous random graphs defined on the same set of vertices. We propose tests based on estimates of the Frobenius and operator norms of the difference between the population adjacency matrices. We show that the tests are uniformly consistent in both the "large graph, small sample" and "small graph, large sample" regimes. We further derive lower bounds on the minimax separation rate for the associated testing problems, and show that the constructed tests are near optimal.
Data Structures Related to Machine Learning Algorithms - DZone AI
In either case, the better your knowledge of data structures and algorithms, the easier time you'll have when it comes time to code up. I don't think the data structures used in machine learning are significantly different than those used in other areas of software development. Because of the size and difficulty of many of the problems, however, having a really solid handle on the basics is essential. Also, because machine learning is a very mathematical field, one should keep in mind how data structures can be used to solve mathematical problems and how they are mathematical objects in their own right. There are two ways to classify data structures: by their implementation and by their operation.
Centrality measures for graphons
Avella-Medina, Marco, Parise, Francesca, Schaub, Michael T., Segarra, Santiago
Graphs provide a natural mathematical abstraction for systems with pairwise interactions, and thus have become a prevalent tool for the representation of systems across various scientific domains. However, as the size of relational datasets continues to grow, traditional graph-based approaches are increasingly replaced by other modeling paradigms, which enable a more flexible treatment of such datasets. A promising framework in this context is provided by graphons, which have been formally introduced as the natural limiting objects for graphs of increasing sizes. However, while the theory of graphons is already well developed, some prominent tools in network analysis still have no counterpart within the realm of graphons. In particular, node centrality measures, which have been successfully employed in various applications to reveal important nodes in a network, have so far not been defined for graphons. In this work we introduce formal definitions of centrality measures for graphons and establish their connections to centrality measures defined on finite graphs. In particular, we build on the theory of linear integral operators to define degree, eigenvector, and Katz centrality functions for graphons. We further establish concentration inequalities showing that these centrality functions are natural limits of their analogous counterparts defined on sequences of random graphs of increasing size. We discuss several strategies for computing these centrality measures, and illustrate them through a set of numerical examples.
Stochastic Alternating Direction Method of Multipliers with Variance Reduction for Nonconvex Optimization
Huang, Feihu, Chen, Songcan, Lu, Zhaosong
In the paper, we study the stochastic alternating direction method of multipliers (ADMM) for the nonconvex optimizations, and propose three classes of the nonconvex stochastic ADMM with variance reduction, based on different reduced variance stochastic gradients. Specifically, the first class called the nonconvex stochastic variance reduced gradient ADMM (SVRG-ADMM), uses a multi-stage scheme to progressively reduce the variance of stochastic gradients. The second is the nonconvex stochastic average gradient ADMM (SAG-ADMM), which additionally uses the old gradients estimated in the previous iteration. The third called SAGA-ADMM is an extension of the SAG-ADMM method. Moreover, under some mild conditions, we establish the iteration complexity bound of $O(1/\epsilon)$ of the proposed methods to obtain an $\epsilon$-stationary solution of the nonconvex optimizations. In particular, we provide a general framework to analyze the iteration complexity of these nonconvex stochastic ADMM methods with variance reduction. Finally, some numerical experiments demonstrate the effectiveness of our methods.
Submodular Variational Inference for Network Reconstruction
Chen, Lin, Crawford, Forrest W, Karbasi, Amin
In real-world and online social networks, individuals receive and transmit information in real time. Cascading information transmissions (e.g. phone calls, text messages, social media posts) may be understood as a realization of a diffusion process operating on the network, and its branching path can be represented by a directed tree. The process only traverses and thus reveals a limited portion of the edges. The network reconstruction/inference problem is to infer the unrevealed connections. Most existing approaches derive a likelihood and attempt to find the network topology maximizing the likelihood, a problem that is highly intractable. In this paper, we focus on the network reconstruction problem for a broad class of real-world diffusion processes, exemplified by a network diffusion scheme called respondent-driven sampling (RDS). We prove that under realistic and general models of network diffusion, the posterior distribution of an observed RDS realization is a Bayesian log-submodular model.We then propose VINE (Variational Inference for Network rEconstruction), a novel, accurate, and computationally efficient variational inference algorithm, for the network reconstruction problem under this model. Crucially, we do not assume any particular probabilistic model for the underlying network. VINE recovers any connected graph with high accuracy as shown by our experimental results on real-life networks.
Weighted SGD for $\ell_p$ Regression with Randomized Preconditioning
Yang, Jiyan, Chow, Yin-Lam, Ré, Christopher, Mahoney, Michael W.
In recent years, stochastic gradient descent (SGD) methods and randomized linear algebra (RLA) algorithms have been applied to many large-scale problems in machine learning and data analysis. We aim to bridge the gap between these two methods in solving constrained overdetermined linear regression problems---e.g., $\ell_2$ and $\ell_1$ regression problems. We propose a hybrid algorithm named pwSGD that uses RLA techniques for preconditioning and constructing an importance sampling distribution, and then performs an SGD-like iterative process with weighted sampling on the preconditioned system. We prove that pwSGD inherits faster convergence rates that only depend on the lower dimension of the linear system, while maintaining low computation complexity. Particularly, when solving $\ell_1$ regression with size $n$ by $d$, pwSGD returns an approximate solution with $\epsilon$ relative error in the objective value in $\mathcal{O}(\log n \cdot \text{nnz}(A) + \text{poly}(d)/\epsilon^2)$ time. This complexity is uniformly better than that of RLA methods in terms of both $\epsilon$ and $d$ when the problem is unconstrained. For $\ell_2$ regression, pwSGD returns an approximate solution with $\epsilon$ relative error in the objective value and the solution vector measured in prediction norm in $\mathcal{O}(\log n \cdot \text{nnz}(A) + \text{poly}(d) \log(1/\epsilon) /\epsilon)$ time. We also provide lower bounds on the coreset complexity for more general regression problems, indicating that still new ideas will be needed to extend similar RLA preconditioning ideas to weighted SGD algorithms for more general regression problems. Finally, the effectiveness of such algorithms is illustrated numerically on both synthetic and real datasets.
Facebook's advice to students interested in artificial intelligence
That's the gist of the advice to students interested in AI from Facebook's Yann LeCun and Joaquin Quiñonero Candela who run the company's Artificial Intelligence Lab and Applied Machine Learning group respectively. Tech companies often advocate STEM (science, technology, engineering and math), but today's tips are particularly pointed. The pair specifically note that students should eat their vegetables take Calc I, Calc II, Calc III, Linear Algebra, Probability and Statistics as early as possible. From this list, probability and statistics are perhaps the most interesting. From what I remember about high-school, those two subjects are regularly dismissed as too-obvious strategies for skirting the informal AP Calculus preference of top colleges and universities (AP Statistics is often thought of as a cop-out by students).
Graph Learning from Data under Structural and Laplacian Constraints
Egilmez, Hilmi E., Pavez, Eduardo, Ortega, Antonio
RAPHS are generic mathematical structures consisting of sets of vertices and edges, which are used for modeling pairwise relations (edges) between a number of objects (vertices). In practice, this representation is often extended to weighted graphs, for which a set of scalar values (weights) are assigned to edges and potentially to vertices. Thus, weighted graphs offer general and flexible representations for modeling affinity relations between the objects of interest. Many practical problems can be represented using weighted graphs. For example, a broad class of combinatorial problems such as weighted matching, shortest-path and network-flow [2] are defined using weighted graphs. In signal/data-oriented problems, weighted graphs provide concise (sparse) representations for robust modeling of signals/data [3]. Such graphbased models are also useful for analyzing and visualizing the relations between their samples/features. Moreover, weighted graphs naturally emerge in networked data applications, such as learning, signal processing and analysis on computer, social, sensor, energy, transportation and biological networks [4], where the signals/data are inherently related to a graph associated with the underlying network.
Less than a Single Pass: Stochastically Controlled Stochastic Gradient Method
Lei, Lihua, Jordan, Michael I.
We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales, with the number of updates at the faster scale being governed by a geometric random variable. Unlike most existing algorithms in this family, both the computation cost and the communication cost of SCSG do not necessarily scale linearly with the sample size $n$; indeed, these costs are independent of $n$ when the target accuracy is low. An experimental evaluation on real datasets confirms the effectiveness of SCSG.
Poisson intensity estimation with reproducing kernels
Flaxman, Seth, Teh, Yee Whye, Sejdinovic, Dino
Despite the fundamental nature of the inhomogeneous Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially when observed points lie in a high-dimensional space. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. Whereas RKHS models used in supervised learning rely on the so-called representer theorem, the form of the inhomogeneous Poisson process likelihood means that the representer theorem does not apply. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.