AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs

arXiv.org Machine LearningJun-5-2019

Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential equations. First, we show that simGD, as is, converges with stochastic sub-gradients under strict convexity in the primal variable. Second, we generalize optimistic simGD to accommodate an optimism rate separate from the learning rate and show its convergence with full gradients. Finally, we present anchored simGD, a new method, and show convergence with stochastic subgradients.

artificial intelligence, convergence, machine learning, (16 more...)

arXiv.org Machine Learning

1905.10899

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.70)

Add feedback

Sparse Representation Classification via Screening for Graphs

Shen, Cencheng, Chen, Li, Dong, Yuexiao, Priebe, Carey

arXiv.org Machine LearningJun-4-2019

The sparse representation classifier (SRC) is shown to work well for image recognition problems that satisfy a subspace assumption. In this paper we propose a new implementation of SRC via screening, establish its equivalence to the original SRC under regularity conditions, and prove its classification consistency for random graphs drawn from stochastic blockmodels. The results are demonstrated via simulations and real data experiments, where the new algorithm achieves comparable numerical performance but significantly faster.

classification, machine learning, pattern recognition, (13 more...)

arXiv.org Machine Learning

1906.01601

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)

Add feedback

On stochastic gradient Langevin dynamics with dependent data streams: the fully non-convex case

Chau, Ngoc Huy, Moulines, Éric, Rásonyi, Miklos, Sabanis, Sotirios, Zhang, Ying

arXiv.org Machine LearningMay-30-2019

We consider the problem of sampling from a target distribution which is \emph{not necessarily logconcave}. Non-asymptotic analysis results are established in a suitable Wasserstein-type distance of the Stochastic Gradient Langevin Dynamics (SGLD) algorithm, when the gradient is driven by even \emph{dependent} data streams. Our estimates are sharper and \emph{uniform} in the number of iterations, in contrast to those in previous studies.

assumption 2, lemma 3, theorem 2, (13 more...)

arXiv.org Machine Learning

1905.13142

Country:

Europe > France (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Differentiable Sorting using Optimal Transport:The Sinkhorn CDF and Quantile Operator

Cuturi, Marco, Teboul, Olivier, Vert, Jean-Philippe

arXiv.org Machine LearningMay-28-2019

Sorting an array is a fundamental routine in machine learning, one that is used to compute rank-based statistics, cumulative distribution functions (CDFs), quantiles, or to select closest neighbors and labels. The sorting function is however piece-wise constant (the sorting permutation of a vector does not change if the entries of that vector are infinitesimally perturbed) and therefore has no gradient information to back-propagate. We propose a framework to sort elements that is algorithmically differentiable. We leverage the fact that sorting can be seen as a particular instance of the optimal transport (OT) problem on $\mathbb{R}$, from input values to a predefined array of sorted values (e.g. $1,2,\dots,n$ if the input array has $n$ elements). Building upon this link , we propose generalized CDFs and quantile operators by varying the size and weights of the target presorted array. Because this amounts to using the so-called Kantorovich formulation of OT, we call these quantities K-sorts, K-CDFs and K-quantiles. We recover differentiable algorithms by adding to the OT problem an entropic regularization, and approximate it using a few Sinkhorn iterations. We call these operators S-sorts, S-CDFs and S-quantiles, and use them in various learning settings: we benchmark them against the recently proposed neuralsort [Grover et al. 2019], propose applications to quantile regression and introduce differentiable formulations of the top-k accuracy that deliver state-of-the art performance.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Machine Learning

1905.11885

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Matrix Methods in Data Analysis, Signal Processing, and Machine Learning

#artificialintelligenceMay-21-2019, 04:36:22 GMT

Linear algebra concepts are key for understanding and creating machine learning algorithms, especially as applied to deep learning and neural networks.

deep learning, machine learning, signal processing, (3 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Spectral Clustering of Signed Graphs via Matrix Power Means

Mercado, Pedro, Tudisco, Francesco, Hein, Matthias

arXiv.org Machine LearningMay-15-2019

Signed graphs encode positive (attractive) and negative (repulsive) relations between nodes. We extend spectral clustering to signed graphs via the one-parameter family of Signed Power Mean Laplacians, defined as the matrix power mean of normalized standard and signless Laplacians of positive and negative edges. We provide a thorough analysis of the proposed approach in the setting of a general Stochastic Block Model that includes models such as the Labeled Stochastic Block Model and the Censored Block Model. We show that in expectation the signed power mean Laplacian captures the ground truth clusters under reasonable settings where state-of-the-art approaches fail. Moreover, we prove that the eigenvalues and eigenvector of the signed power mean Laplacian concentrate around their expectation under reasonable conditions in the general Stochastic Block Model. Extensive experiments on random graphs and real world datasets confirm the theoretically predicted behaviour of the signed power mean Laplacian and show that it compares favourably with state-of-the-art methods.

artificial intelligence, machine learning, social media, (16 more...)

arXiv.org Machine Learning

1905.0623

Country: North America > United States > California (0.27)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

Solving Empirical Risk Minimization in the Current Matrix Multiplication Time

Lee, Yin Tat, Song, Zhao, Zhang, Qiuyi

arXiv.org Machine LearningMay-11-2019

Many convex problems in machine learning and computer science share the same form: \begin{align*} \min_{x} \sum_{i} f_i( A_i x + b_i), \end{align*} where $f_i$ are convex functions on $\mathbb{R}^{n_i}$ with constant $n_i$, $A_i \in \mathbb{R}^{n_i \times d}$, $b_i \in \mathbb{R}^{n_i}$ and $\sum_i n_i = n$. This problem generalizes linear programming and includes many problems in empirical risk minimization. In this paper, we give an algorithm that runs in time \begin{align*} O^* ( ( n^{\omega} + n^{2.5 - \alpha/2} + n^{2+ 1/6} ) \log (n / \delta) ) \end{align*} where $\omega$ is the exponent of matrix multiplication, $\alpha$ is the dual exponent of matrix multiplication, and $\delta$ is the relative accuracy. Note that the runtime has only a log dependence on the condition numbers or other data dependent parameters and these are captured in $\delta$. For the current bound $\omega \sim 2.38$ [Vassilevska Williams'12, Le Gall'14] and $\alpha \sim 0.31$ [Le Gall, Urrutia'18], our runtime $O^* ( n^{\omega} \log (n / \delta))$ matches the current best for solving a dense least squares regression problem, a special case of the problem we consider. Very recently, [Alman'18] proved that all the current known techniques can not give a better $\omega$ below $2.168$ which is larger than our $2+1/6$. Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems. Our algorithm proposes two concepts which are different from [Cohen, Lee, Song'19] : $\bullet$ We give a robust deterministic central path method, whereas the previous one is a stochastic central path which updates weights by a random sparse vector. $\bullet$ We propose an efficient data-structure to maintain the central path of interior point methods even when the weights update vector is dense.

artificial intelligence, machine learning, step follow, (17 more...)

arXiv.org Machine Learning

1905.04447

Country:

North America > United States > Virginia (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Learning Erd\H{o}s-R\'enyi Random Graphs via Edge Detecting Queries

Li, Zihan, Fresacher, Matthias, Scarlett, Jonathan

arXiv.org Machine LearningMay-10-2019

In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. I. INTRODUCTION Graphs are a ubiquitous tool in modern statistics and machine learning for depicting interactions, relations, and physical connections in networks, such as social networks, biological networks, sensor networks, and so on. Often, the graph is not known a priori, and must be learned via queries to the network. In this paper, we consider the problem of graph learning via edge detecting queries, where each query contains a subset of the nodes, and the binary outcome indicates whether or not there is at least one edge among these nodes. See Section IA for previous work on this problem. An application of this problem highlighted in previous works such as [15] is that of learning which chemicals react with each other, using tests that are able to detect whether any reaction occurs. Another potential application is learning connectivity in large wireless networks: Each node is given a unique identifier, and in response to a query, each node sends feedback to a central unit if both itself and one or more of its neigbors are included in that query.

artificial intelligence, machine learning, probability, (19 more...)

arXiv.org Machine Learning

1905.0341

Country:

Asia > Singapore (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)

Add feedback

Hitting Time of Stochastic Gradient Langevin Dynamics to Stationary Points: A Direct Analysis

Chen, Xi, Du, Simon S., Tong, Xin T.

arXiv.org Machine LearningApr-29-2019

Stochastic gradient Langevin dynamics (SGLD) is a fundamental algorithm in stochastic optimization. Recent work by Zhang et al. [2017] presents an analysis for the hitting time of SGLD for the first and second order stationary points. The proof in Zhang et al. [2017] is a two-stage procedure through bounding the Cheeger's constant, which is rather complicated and leads to loose bounds. In this paper, using intuitions from stochastic differential equations, we provide a direct analysis for the hitting times of SGLD to the first and second order stationary points. Our analysis is straightforward. It only relies on basic linear algebra and probability theory tools. Our direct analysis also leads to tighter bounds comparing to Zhang et al. [2017] and shows the explicit dependence of the hitting time on different factors, including dimensionality, smoothness, noise strength, and step size effects. Under suitable conditions, we show that the hitting time of SGLD to first-order stationary points can be dimension-independent. Moreover, we apply our analysis to study several important online estimation problems in machine learning, including linear regression, matrix factorization, and online PCA.

artificial intelligence, machine learning, stationary point, (16 more...)

arXiv.org Machine Learning

1904.13016

Country: Asia (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Convergence rates for ordinal embedding

Ellenberg, Jordan S., Jain, Lalit

arXiv.org Machine LearningApr-29-2019

We prove optimal bounds for the convergence rate of ordinal embedding (also known as non-metric multidimensional scaling) in the 1-dimensional case. The examples witnessing optimality of our bounds arise from a result in additive number theory on sets of integers with no three-term arithmetic progressions. We also carry out some computational experiments aimed at developing a sense of what the convergence rate for ordinal embedding might look like in higher dimensions.

artificial intelligence, hausdorff distance, triplet comparison, (17 more...)

arXiv.org Machine Learning

1904.12994

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback