Goto

Collaborating Authors

 Mathematical & Statistical Methods


A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

arXiv.org Machine Learning

We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.


Decentralized Optimization with Heterogeneous Delays: a Continuous-Time Approach

arXiv.org Machine Learning

In decentralized optimization, nodes of a communication network privately possess a local objective function, and communicate using gossip-based methods in order to minimize the average of these per-node objectives. While synchronous algorithms can be heavily slowed down by a few nodes and edges in the graph (the straggler problem), their asynchronous counterparts lack from a sharp analysis taking into account heterogeneous delays in the communication network. In this paper, we propose a novel continuous-time framework to analyze asynchronous algorithms, which does not require to define a global ordering of the events, and allows to finely characterize the time complexity in the presence of (heterogeneous) delays. Using this framework, we describe a fully asynchronous decentralized algorithm to minimize the sum of smooth and strongly convex functions. Our algorithm (DCDM, Delayed Coordinate Dual Method), based on delayed randomized gossip communications and local computational updates, achieves an asynchronous speed-up: the rate of convergence is tightly characterized in terms of the eigengap of the graph weighted by local delays only, instead of the global worst-case delays as in previous analyses.


Statistical embedding: Beyond principal components

arXiv.org Machine Learning

There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and kernel based methods. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. The final part of the survey deals with embedding in $\mathbb{R}^2$, which is visualization. Three methods are presented: $t$-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triple of noisy Ranunculoid curves, and one consisting of networks of increasing complexity and with two types of nodes.


A Scalable Second Order Method for Ill-Conditioned Matrix Completion from Few Samples

arXiv.org Machine Learning

We propose an iterative algorithm for low-rank matrix completion that can be interpreted as an iteratively reweighted least squares (IRLS) algorithm, a saddle-escaping smoothing Newton method or a variable metric proximal gradient method applied to a non-convex rank surrogate. It combines the favorable data-efficiency of previous IRLS approaches with an improved scalability by several orders of magnitude. We establish the first local convergence guarantee from a minimal number of samples for that class of algorithms, showing that the method attains a local quadratic convergence rate. Furthermore, we show that the linear systems to be solved are well-conditioned even for very ill-conditioned ground truth matrices. We provide extensive experiments, indicating that unlike many state-of-the-art approaches, our method is able to complete very ill-conditioned matrices with a condition number of up to $10^{10}$ from few samples, while being competitive in its scalability.


Essential Linear Algebra for Data Science and Machine Learning - KDnuggets

#artificialintelligence

Linear Algebra is a branch of mathematics that is extremely useful in data science and machine learning. Linear algebra is the most important math skill in machine learning. Most machine learning models can be expressed in matrix form. A dataset itself is often represented as a matrix. Linear algebra is used in data preprocessing, data transformation, and model evaluation.


LASSO regression using tidymodels and #TidyTuesday data for The Office

#artificialintelligence

Our modeling goal here is to predict the IMDB ratings for episodes of The Office based on the other characteristics of the episodes in the #TidyTuesday dataset. There are two datasets, one with the ratings and one with information like director, writer, and which character spoke which line. The episode numbers and titles are not consistent between them, so we can use regular expressions to do a better job of matching the datasets up for joining. We are going to use several different kinds of features for modeling. First, let's find out how many times characters speak per episode.


Math for Data Science and Machine Learning: University Level

#artificialintelligence

In this course, we will learn math for data science and machine learning. We will also discuss the importance of Math for data science and machine learning in practical words. Moreover, Math for data science and machine learning course is a bundle of two courses of linear algebra and probability and statistics. So, students will learn the complete contents of probability and statistics, and linear algebra. It is not like that you will not complete all the contents in this 7 hours video course.


Graph Theory Algorithms

#artificialintelligence

My name is William, and I'm super excited to bring to you this a video series focused on graph theory. Graph theory is one of my absolute favorite topics in computer science. We're going to see a lot of very awesome algorithms. The whole field is very diverse and hugely applicable to real world applications. I think everybody should be able to learn, love and enjoy graph theory.


Review of Low-Voltage Load Forecasting: Methods, Applications, and Recommendations

arXiv.org Machine Learning

The increased digitalisation and monitoring of the energy system opens up numerous opportunities % and solutions which can help to decarbonise the energy system. Applications on low voltage (LV), localised networks, such as community energy markets and smart storage will facilitate decarbonisation, but they will require advanced control and management. Reliable forecasting will be a necessary component of many of these systems to anticipate key features and uncertainties. Despite this urgent need, there has not yet been an extensive investigation into the current state-of-the-art of low voltage level forecasts, other than at the smart meter level. This paper aims to provide a comprehensive overview of the landscape, current approaches, core applications, challenges and recommendations. Another aim of this paper is to facilitate the continued improvement and advancement in this area. To this end, the paper also surveys some of the most relevant and promising trends. It establishes an open, community-driven list of the known LV level open datasets to encourage further research and development.


Understanding Bandits with Graph Feedback

arXiv.org Machine Learning

The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph $G=(V,E)$ where $V$ is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number $\delta^*$ and the $k$-packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual -- the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper bound $O\left(\left( \delta^*\log |V|\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ and a lower bound $\Omega\left(\left(\delta^*/\alpha\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ where $\alpha$ is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a $\left(\log |V|\right)^{\frac{1}{3}}$ factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the $\left(\log |V|\right)^{\frac{1}{3}}$ factor and establish optimal regret.