Charalambides, Neophytos
Iterative Sketching for Secure Coded Regression
Charalambides, Neophytos, Mahdavifar, Hessam, Pilanci, Mert, Hero, Alfred O. III
In this work, we propose methods for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate gradient coding scheme}, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. This results in a distributive \textit{iterative sketching} approach for an $\ell_2$-subspace embedding, \textit{i.e.} a new sketch is considered at each iteration. We also focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be modified in order to secure the data.
Gradient Coding through Iterative Block Leverage Score Sampling
Charalambides, Neophytos, Pilanci, Mert, Hero, Alfred
We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distributed computational networks, \textit{i.e.} stragglers. We replicate the data across the distributed network, to attain the approximation guarantees through the induced sampling distribution. The significance and main contribution of this work, is that it unifies randomized numerical linear algebra with approximate coded computing, while attaining an induced $\ell_2$-subspace embedding through uniform sampling. The transition to uniform sampling is done without applying a random projection, as in the case of the subsampled randomized Hadamard transform. Furthermore, by incorporating this technique to coded computing, our scheme is an iterative sketching approach to approximately solving linear regression. We also propose weighting when sketching takes place through sampling with replacement, for further compression.
Dimensionality Reduction for $k$-means Clustering
Charalambides, Neophytos
Along with modern developments and the necessity of large high-dimensional datasets, which due to their nature result in overfitting of many machine learning algorithms, it is crucial that one reduces the complexity of the algorithms involving these datasets. The authors of [BDM09] are of the first to address the issue of clustering in such datasets with provably accurate approximation results, by proposing a simple pre-processing step to the " k-means" clustering algorithm; also known as Lloyd's method [Llo82] -- probably the most widely used and popular clustering algorithm.