Mathematical & Statistical Methods
Linear-Time Algorithm for Learning Large-Scale Sparse Graphical Models
Zhang, Richard Y., Fattahi, Salar, Sojoudi, Somayeh
The sparse inverse covariance estimation problem is commonly solved using an $\ell_{1}$-regularized Gaussian maximum likelihood estimator known as "graphical lasso", but its computational cost becomes prohibitive for large data sets. A recent line of results showed--under mild assumptions--that the graphical lasso estimator can be retrieved by soft-thresholding the sample covariance matrix and solving a maximum determinant matrix completion (MDMC) problem. This paper proves an extension of this result, and describes a Newton-CG algorithm to efficiently solve the MDMC problem. Assuming that the thresholded sample covariance matrix is sparse with a sparse Cholesky factorization, we prove that the algorithm converges to an $\epsilon$-accurate solution in $O(n\log(1/\epsilon))$ time and $O(n)$ memory. The algorithm is highly efficient in practice: we solve the associated MDMC problems with as many as 200,000 variables to 7-9 digits of accuracy in less than an hour on a standard laptop computer running MATLAB.
Journal of Biometrics and Biostatistics - Open Access Journals
Biometrics and Biostatistics are disciplines of biological sciences concerned with the application of mathematical-statistical theory, principles, and practices to the observation, measurement, and analysis of biological data and phenomena. Journal of Biometrics and Biostatistics is a leading peer reviewed journal, promoting open access publishing in the collection of major scientific journals available in the scientific society. This promotes the application of statistical methods to the solution of biological problems. Journal of Biometrics and Biostatistics is a academic journal and aims to publish most complete and reliable source of information on the discoveries and current developments in the mode of original articles, review articles, case reports, short communications, etc. in all areas related to Biometrics, Medical statistics and making them freely available through online without any restrictions or any other subscriptions to researchers worldwide. It is an online manuscript submission, review and managing systems.
Tools for higher-order network analysis
Networks are a fundamental model of complex systems throughout the sciences, and network datasets are typically analyzed through lower-order connectivity patterns described at the level of individual nodes and edges. However, higher-order connectivity patterns captured by small subgraphs, also called network motifs, describe the fundamental structures that control and mediate the behavior of many complex systems. We develop three tools for network analysis that use higher-order connectivity patterns to gain new insights into network datasets: (1) a framework to cluster nodes into modules based on joint participation in network motifs; (2) a generalization of the clustering coefficient measurement to investigate higher-order closure patterns; and (3) a definition of network motifs for temporal networks and fast algorithms for counting them. Using these tools, we analyze data from biology, ecology, economics, neuroscience, online social networks, scientific collaborations, telecommunications, transportation, and the World Wide Web.
Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization
Xu, Pan, Chen, Jinghui, Zou, Difan, Gu, Quanquan
We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the almost minimizer within $\tilde O\big(nd/(\lambda\epsilon) \big)$ and $\tilde O\big(d^7/(\lambda^5\epsilon^5) \big)$ stochastic gradient evaluations respectively, where $d$ is the problem dimension, and $\lambda$ is the spectral gap of the Markov chain generated by GLD. Both of the results improve upon the best known gradient complexity results. Furthermore, for the first time we prove the global convergence guarantee for variance reduced stochastic gradient Langevin dynamics (VR-SGLD) to the almost minimizer after $\tilde O\big(\sqrt{n}d^5/(\lambda^4\epsilon^{5/2})\big)$ stochastic gradient evaluations, which outperforms the gradient complexities of GLD and SGLD in a wide regime. Our theoretical analyses shed some light on using Langevin dynamics based algorithms for nonconvex optimization with provable guarantees.
Optimizing Spectral Sums using Randomized Chebyshev Expansions
Han, Insu, Avron, Haim, Shin, Jinwoo
The trace of matrix functions, often called spectral sums, e.g., rank, log-determinant and nuclear norm, appear in many machine learning tasks. However, optimizing or computing such (parameterized) spectral sums typically involves the matrix decomposition at the cost cubic in the matrix dimension, which is expensive for large-scale applications. Several recent works were proposed to approximate large-scale spectral sums utilizing polynomial function approximations and stochastic trace estimators. However, all prior works on this line have studied biased estimators, and their direct adaptions to an optimization task under stochastic gradient descent (SGD) frameworks often do not work as accumulated biased errors prevent stable convergence to the optimum. To address the issue, we propose the provable optimal unbiased estimator by randomizing Chebyshev polynomial degrees. We further introduce two additional techniques for accelerating SGD, where key ideas are on sharing randomness among many estimations during the iterative procedure. Finally, we showcase two applications of the proposed SGD schemes: matrix completion and learning Gaussian process, under the real-world datasets.
Newton-Type Methods for Non-Convex Optimization Under Inexact Hessian Information
Xu, Peng, Roosta-Khorasani, Farbod, Mahoney, Michael W.
We consider variants of trust-region and cubic regularization methods for non-convex optimization, in which the Hessian matrix is approximated. Under mild conditions on the inexact Hessian, and using approximate solution of the corresponding sub-problems, we provide iteration complexity to achieve $ \epsilon $-approximate second-order optimality which have shown to be tight. Our Hessian approximation conditions constitute a major relaxation over the existing ones in the literature. Consequently, we are able to show that such mild conditions allow for the construction of the approximate Hessian through various random sampling methods. In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding sub-sampled trust-region and cubic regularization methods.
Quantum machine learning: a classical perspective
Ciliberto, Carlo, Herbster, Mark, Ialongo, Alessandro Davide, Pontil, Massimiliano, Rocchetto, Andrea, Severini, Simone, Wossnig, Leonard
Recently, increased computational power and data availability, as well as algorithmic advances, have led machine learning techniques to impressive results in regression, classification, data-generation and reinforcement learning tasks. Despite these successes, the proximity to the physical limits of chip fabrication alongside the increasing size of datasets are motivating a growing number of researchers to explore the possibility of harnessing the power of quantum computation to speed-up classical machine learning algorithms. Here we review the literature in quantum machine learning and discuss perspectives for a mixed readership of classical machine learning and quantum computation experts. Particular emphasis will be placed on clarifying the limitations of quantum algorithms, how they compare with their best classical counterparts and why quantum resources are expected to provide advantages for learning problems. Learning in the presence of noise and certain computationally hard problems in machine learning are identified as promising directions for the field. Practical questions, like how to upload classical data into quantum form, will also be addressed.
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. The algorithm is a slight variant of the ProxSVRG algorithm [Reddi et al., 2016b]. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results (in terms of the number of stochastic gradient oracle calls and proximal operations), and improves/generalizes some others. In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case. ProxSVRG+ is more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., 2016b]. Finally, for nonconvex functions satisfied Polyak-{\L}ojasiewicz condition, we show that ProxSVRG+ achieves global linear convergence rate without restart. ProxSVRG+ is always no worse than ProxGD and ProxSVRG/SAGA, and sometimes outperforms them (and generalizes the results of SCSG) in this case.
Introduction to Matrix Types in Linear Algebra for Machine Learning - Machine Learning Mastery
To be symmetric, the axis of symmetry is always the main diagonal of the matrix, from the top left to the bottom right. Below is an example of a 5 5 symmetric matrix. A symmetric matrix is always square and equal to its own transpose. A triangular matrix is a type of square matrix that has all values in the upper-right or lower-left of the matrix with the remaining elements filled with zero values. A triangular matrix with values only above the main diagonal is called an upper triangular matrix.
A Sinkhorn-Newton method for entropic optimal transport
Brauer, Christoph, Clason, Christian, Lorenz, Dirk, Wirth, Benedikt
The mathematical problem of optimal mass transport has a long history dating back to its introduction in Monge [10], with key contributions by Kantorovivc [6] and Kantorovivc & Rubinvsteuin [7]. It has recently received increased interest due to numerous applications in machine learning; see, e.g., the recent overview of Kolouri, Park, Thorpe, Slepcev & Rohde [9] and the references therein. In a nutshell, the (discrete) problem of optimal transport in its Kantorovich form is to compute for given mass distributions a and b with equal mass a transport plan, i.e., an assignment of how much mass of a at some point should be moved to another point to match the mass in b. This should be done in a way such that some transport cost (usually proportional to the amount of mass and dependent on the distance) is minimized. This leads to a linear optimization problem which has been well studied, but its application in machine learning has been problematic due to large memory requirement and long run time.