--The bulk synchronous parallel (BSP) is a celebrated synchronization model for general-purpose parallel computing that has successfully been employed for distributed training of machine learning models. A prevalent shortcoming of the BSP is that it requires workers to wait for the straggler at every iteration. BSP a model that aims to relax its strict synchronization requirement. The proposed model offers more flexibility and adaptability during the training phase, without sacrificing on the accuracy of the trained model. It also achieves comparable (if not higher) accuracy than the other sensible synchronization models. The parameter server framework   has been widely adopted to distributing the training of large deep neural network (DNN) models  . The framework consists of multiple workers and a logical server that maintains globally shared parameters, typically represented as dense or sparse vectors and matrices , and it supports two approaches: model parallelism and data parallelism . In this paper we focus on data parallelism. Data parallelism refers to partitioning (sharding) of large training data into smaller equal size shards and assigning them to workers. Then, the entire DNN model is replicated to each worker.
In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training. In this paper we propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. BML runs on BCube network, instead of using the traditional Fat-Tree topology. BML algorithm is designed in such a way that, compared to the parameter server (PS) algorithm on a Fat-Tree network connecting the same number of server machines, BML achieves theoretically 1/k of the gradient synchronization time, with k/5 of switches (the typical number of k is 2 4). Experiments of LeNet-5 and VGG-19 benchmarks on a testbed with 9 dual-GPU servers show that, BML reduces the job completion time of DML training by up to 56.4%.
The algorithmic advancement of synchronizing maps is important in order to solve a wide range of practice problems with possible large-scale dataset. In this paper, we provide theoretical justifications for spectral techniques for the map synchronization problem, i.e., it takes as input a collection of objects and noisy maps estimated between pairs of objects, and outputs clean maps between all pairs of objects. We show that a simple normalized spectral method that projects the blocks of the top eigenvectors of a data matrix to the map space leads to surprisingly good results. As the noise is modelled naturally as random permutation matrix, this algorithm NormSpecSync leads to competing theoretical guarantees as state-of-the-art convex optimization techniques, yet it is much more efficient. We demonstrate the usefulness of our algorithm in a couple of applications, where it is optimal in both complexity and exactness among existing methods.
In this work, we propose a hybrid approach to synchronize large scale networks. In particular, we draw on Kalman Filtering (KF) along with time-stamps generated by the Precision Time Protocol (PTP) for pairwise node synchronization. Furthermore, we investigate the merit of Factor Graphs (FGs) along with Belief Propagation (BP) algorithm in achieving high precision end-to-end network synchronization. Finally, we present the idea of dividing the large-scale network into local synchronization domains, for each of which a suitable sync algorithm is utilized. The simulation results indicate that, despite the simplifications in the hybrid approach, the error in the offset estimation remains below 5 ns.
Liang, Junwei (Carnegie Mellon University) | Fan, Desai (Carnegie Mellon University) | Lu, Han (Carnegie Mellon University) | Huang, Poyao (Carnegie Mellon University) | Chen, Jia (Carnegie Mellon University) | Jiang, Lu (Carnegie Mellon University) | Hauptmann, Alexander (Carnegie Mellon University)
What happened during the Boston Marathon in 2013? Nowadays, at any major event, lots of people take videos and share them on social media. To fully understand exactly what happened in these major events, researchers and analysts often have to examine thousands of these videos manually. To reduce this manual effort, we present an investigative system that automatically synchronizes these videos to a global timeline and localizes them on a map. In addition to alignment in time and space, our system combines various functions for analysis, including gunshot detection, crowd size estimation, 3D reconstruction and person tracking. To our best knowledge, this is the first time a unified framework has been built for comprehensive event reconstruction for social media videos.