Goto

Collaborating Authors

 sp method



A Additional qualitative results

Neural Information Processing Systems

We begin by illustrating successful verification results in Appendix A.1, To further contextualize our TP's advantages, we juxtapose these standard HRs encompass a multitude of verified patches; for visual clarity, we've outlined the SIFT points A.2 Standard verification results: compared with SP Hence, our method suitably ranks these accurate index images highly. We further evaluate our topological verification outcomes against those of the SP method. In addition to successful verification instances, we also explore cases where our method fails. Regions (HRs) identified by our method on ROxford. Regarding false negative cases, our method fails to detect any HRs.


Linear Attention Sequence Parallelism

Sun, Weigao, Qin, Zhen, Li, Dong, Shen, Xuyang, Qiao, Yu, Zhong, Yiran

arXiv.org Artificial Intelligence

Sequence Parallel (SP) serves as a prevalent strategy to handle long sequences that exceed the memory limit of a single GPU. However, existing SP methods do not take advantage of linear attention features, resulting in sub-optimal parallelism efficiency and usability for linear attention-based language models. In this paper, we introduce Linear Attention Sequence Parallel (LASP), an efficient SP method tailored to linear attention-based language models. Specifically, we design an efficient point-to-point communication mechanism to leverage the right-product kernel trick of linear attention, which sharply decreases the communication overhead of SP. We also enhance the practical efficiency of LASP by performing kernel fusion and intermediate state caching, making the implementation of LASP hardware-friendly on GPU clusters. Furthermore, we meticulously ensure the compatibility of sequence-level LASP with all types of batch-level data parallel methods, which is vital for distributed training on large clusters with long sequences and large batches. We conduct extensive experiments on two linear attention-based models with varying sequence lengths and GPU cluster sizes. LASP scales sequence length up to 4096K using 128 A100 80G GPUs on 1B models, which is 8 times longer than existing SP methods while being significantly faster. The code is available at https://github.com/OpenNLPLab/LASP.


ALS: Augmented Lagrangian Sketching Methods for Linear Systems

Morshed, Md Sarowar

arXiv.org Artificial Intelligence

We develop two fundamental stochastic sketching techniques; Penalty Sketching (PS) and Augmented Lagrangian Sketching (ALS) for solving consistent linear systems. The proposed PS and ALS techniques extend and generalize the scope of Sketch & Project (SP) method by introducing Lagrangian penalty sketches. In doing so, we recover SP methods as special cases and furthermore develop a family of new stochastic iterative methods. By varying sketch parameters in the proposed PS method, we recover novel stochastic methods such as Penalty Newton Descent, Penalty Kaczmarz, Penalty Stochastic Descent, Penalty Coordinate Descent, Penalty Gaussian Pursuit, and Penalty Block Kaczmarz. Furthermore, the proposed ALS method synthesizes a wide variety of new stochastic methods such as Augmented Newton Descent, Augmented Kaczmarz, Augmented Stochastic Descent, Augmented Coordinate Descent, Augmented Gaussian Pursuit, and Augmented Block Kaczmarz into one framework. Moreover, we show that the developed PS and ALS frameworks can be used to reformulate the original linear system into equivalent stochastic optimization problems namely the Penalty Stochastic Reformulation and Augmented Stochastic Reformulation. We prove global convergence rates for the PS and ALS methods as well as sub-linear $\mathcal{O}(\frac{1}{k})$ rates for the Cesaro average of iterates. The proposed convergence results hold for a wide family of distributions of random matrices, which provides the opportunity of fine-tuning the randomness of the method suitable for specific applications. Finally, we perform computational experiments that demonstrate the efficiency of our methods compared to the existing SP methods.