Goto

Collaborating Authors

 Mathematical & Statistical Methods


Three myths about data scientists and big data

@machinelearnbot

What I found useful during my PhD (this could apply to master program too) is that I immediately started to work for a company on GIS, digital cartography, and water management (predicting extreme floods locally - how much the water could rise, at worse in 100 years, at any (x,y) coordinate on a digital map, modeling how any drop of water falling somewhere runs down, goes underground, eventually reaches low elevation and merges with other water drops on the way down - the digital maps had elevation and land use data available for each pixel; by land use I mean crop, forest, water, rock and so on, as this is important to model how water moves). Very applied and interesting stuff. My first paper (after an article about flood predictions, in a local specialized journal) was in Journal of Number Theory though I never attended classes on number theory. I then started to publish in computational statistics journal, but also in IEEE Pattern Analysis and Machine Intelligence, and Journal of the Royal Statistical Society, series B. I'm currently finishing a book on data science (Wiley, exp. The take away from this is that it helps getting polyvalent, if the PhD/Master student can do applied work for a real company, hired and paid as a real employee (partnership between university and private sector), at the beginning of his program.


Fast Eigenspace Approximation using Random Signals

arXiv.org Machine Learning

We focus in this work on the estimation of the first $k$ eigenvectors of any graph Laplacian using filtering of Gaussian random signals. We prove that we only need $k$ such signals to be able to exactly recover as many of the smallest eigenvectors, regardless of the number of nodes in the graph. In addition, we address key issues in implementing the theoretical concepts in practice using accurate approximated methods. We also propose fast algorithms both for eigenspace approximation and for the determination of the $k$th smallest eigenvalue $\lambda_k$. The latter proves to be extremely efficient under the assumption of locally uniform distribution of the eigenvalue over the spectrum. Finally, we present experiments which show the validity of our method in practice and compare it to state-of-the-art methods for clustering and visualization both on synthetic small-scale datasets and larger real-world problems of millions of nodes. We show that our method allows a better scaling with the number of nodes than all previous methods while achieving an almost perfect reconstruction of the eigenspace formed by the first $k$ eigenvectors.


Estimating the Size of a Large Network and its Communities from a Random Sample

arXiv.org Machine Learning

Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V;E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that correctly estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios. We conclude with extensions and directions for future work.


How Well Do Local Algorithms Solve Semidefinite Programs?

arXiv.org Machine Learning

Several probabilistic models from high-dimensional statistics and machine learning reveal an intriguing --and yet poorly understood-- dichotomy. Either simple local algorithms succeed in estimating the object of interest, or even sophisticated semi-definite programming (SDP) relaxations fail. In order to explore this phenomenon, we study a classical SDP relaxation of the minimum graph bisection problem, when applied to Erd\H{o}s-Renyi random graphs with bounded average degree $d>1$, and obtain several types of results. First, we use a dual witness construction (using the so-called non-backtracking matrix of the graph) to upper bound the SDP value. Second, we prove that a simple local algorithm approximately solves the SDP to within a factor $2d^2/(2d^2+d-1)$ of the upper bound. In particular, the local algorithm is at most $8/9$ suboptimal, and $1+O(1/d)$ suboptimal for large degree. We then analyze a more sophisticated local algorithm, which aggregates information according to the harmonic measure on the limiting Galton-Watson (GW) tree. The resulting lower bound is expressed in terms of the conductance of the GW tree and matches surprisingly well the empirically determined SDP values on large-scale Erd\H{o}s-Renyi graphs. We finally consider the planted partition model. In this case, purely local algorithms are known to fail, but they do succeed if a small amount of side information is available. Our results imply quantitative bounds on the threshold for partial recovery using SDP in this model.


Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues

arXiv.org Machine Learning

There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity. This class includes parametric models including the BTL and Thurstone models as special cases, but is considerably more general. We provide various examples of models in this broader stochastically transitive class for which classical parametric models provide poor fits. Despite this greater flexibility, we show that the matrix of probabilities can be estimated at the same rate as in standard parametric models. On the other hand, unlike in the BTL and Thurstone models, computing the minimax-optimal estimator in the stochastically transitive model is non-trivial, and we explore various computationally tractable alternatives. We show that a simple singular value thresholding algorithm is statistically consistent but does not achieve the minimax rate. We then propose and study algorithms that achieve the minimax rate over interesting sub-classes of the full stochastically transitive class. We complement our theoretical results with thorough numerical simulations.


Exact and Inexact Subsampled Newton Methods for Optimization

arXiv.org Machine Learning

The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact). We provide a complexity analysis for this method based on the properties of the CG iteration and the quality of the Hessian approximation, and compare it with a method that employs a stochastic gradient iteration instead of the CG method. We report preliminary numerical results that illustrate the performance of inexact subsampled Newton methods on machine learning applications based on logistic regression.


An Impossibility Result for Reconstruction in a Degree-Corrected Planted-Partition Model

arXiv.org Machine Learning

We consider a Degree-Corrected Planted-Partition model: a random graph on $n$ nodes with two asymptotically equal-sized clusters. The model parameters are two constants $a,b > 0$ and an i.i.d. sequence of weights $(\phi_u)_{u=1}^n$, with finite second moment $\Phi^{(2)}$. Vertices $u$ and $v$ are joined by an edge with probability $\frac{\phi_u \phi_v}{n}a$ when they are in the same class and with probability $\frac{\phi_u \phi_v}{n}b$ otherwise. We prove that it is information-theoretically impossible to estimate the spins in a way positively correlated with the true community structure when $(a-b)^2 \Phi^{(2)} \leq 2(a+b)$. A by-product of our proof is a precise coupling-result for local-neighbourhoods in Degree-Corrected Planted-Partition models, which could be of independent interest.


Neural nets - learning with total gradient rather than stochastic gradients? • /r/MachineLearning

#artificialintelligence

The estimate of the gradient from just a mini-batch is usually good enough to point you in the right descent direction. It doesn't make sense to do the extra computation for a marginally better estimate. Plus, the inaccuracy or noise introduced by the mini-batch approximation can act as a regularizer. Here is an interesting paper that performs statistical tests during optimization: if the gradient is not statistically significant, more samples are added to the mini-batch.


Spark Technology Center

#artificialintelligence

The Best Paper award for this year's International Conference on Very Large Data Bases (VLDB) goes to "Compressed Linear Algebra for Large-Scale Machine Learning", authored by a PhD candidate at the University of Maryland and four senior researchers from IBM. Their method for compressing matrices for linear algebra operations promises to provide users significant increases in speed with less memory. In particular, the compression technology provides benefits at two different parts of the data science process. Before training a model, a data scientist typically goes through multiple iterations of feature engineering. Common feature engineering tasks include examining the data with descriptive statistics and transforming the values in columns to better suit the assumptions built into different types of machine learning models.


A possible implementation for an Intelligent Agent using Graph theories to crawl Reddit. (RedditSharp QuickGraph MongoDB)

#artificialintelligence

I cannot think more than 2 hours without thinking how to introduce AI techniques into what I'm thinking about. The last time it happened was super interesting and stay with me to see how I used graph theories to crawl reddit and make a knowledge base about Magic the Gathering card relations. Long story short, I was browsing magiccardmarket.eu to check which cards to buy when I found a guy selling a 9 card for 6 . The card spiked over the week-end and I jumped on reddit to check out the reason. Is there a new deck using it?