AITopics | sp method

Collaborating Authors

sp method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ZeCO: Zero-Communication Overhead Sequence Parallelism for Linear Attention

Neural Information Processing SystemsJun-14-2026, 03:53:37 GMT

Linear attention mechanisms deliver significant advantages for Large Language Models (LLMs) by providing linear computational complexity, enabling efficient processing of ultra-long sequences (e.g., 1M context). However, existing Sequence Parallelism (SP) methods, essential for distributing these workloads across devices, become the primary performance bottleneck due to substantial communication overhead. In this paper, we introduce ZeCO (Zero Communication Overhead) sequence parallelism for linear attention models, a new SP method designed to overcome these limitations and achieve practically end-to-end near-linear scalability for long sequence training. For example, training a model with a 1M sequence length across 64 devices using ZeCO takes roughly the same time as training with an 16k sequence on a single device. At the heart of ZeCO lies All-Scan, a novel collective communication primitive.

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)

Add feedback

c972859a984a21658432d7320c7df385-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 01:59:03 GMT

artificial intelligence, image pair, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Additional qualitative results

Neural Information Processing SystemsOct-9-2025, 07:24:27 GMT

We begin by illustrating successful verification results in Appendix A.1, To further contextualize our TP's advantages, we juxtapose these standard HRs encompass a multitude of verified patches; for visual clarity, we've outlined the SIFT points A.2 Standard verification results: compared with SP Hence, our method suitably ranks these accurate index images highly. We further evaluate our topological verification outcomes against those of the SP method. In addition to successful verification instances, we also explore cases where our method fails. Regions (HRs) identified by our method on ROxford. Regarding false negative cases, our method fails to detect any HRs.

artificial intelligence, image pair, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.62)

Add feedback

Linear Attention Sequence Parallelism

Sun, Weigao, Qin, Zhen, Li, Dong, Shen, Xuyang, Qiao, Yu, Zhong, Yiran

arXiv.org Artificial IntelligenceApr-3-2024

Sequence Parallel (SP) serves as a prevalent strategy to handle long sequences that exceed the memory limit of a single GPU. However, existing SP methods do not take advantage of linear attention features, resulting in sub-optimal parallelism efficiency and usability for linear attention-based language models. In this paper, we introduce Linear Attention Sequence Parallel (LASP), an efficient SP method tailored to linear attention-based language models. Specifically, we design an efficient point-to-point communication mechanism to leverage the right-product kernel trick of linear attention, which sharply decreases the communication overhead of SP. We also enhance the practical efficiency of LASP by performing kernel fusion and intermediate state caching, making the implementation of LASP hardware-friendly on GPU clusters. Furthermore, we meticulously ensure the compatibility of sequence-level LASP with all types of batch-level data parallel methods, which is vital for distributed training on large clusters with long sequences and large batches. We conduct extensive experiments on two linear attention-based models with varying sequence lengths and GPU cluster sizes. LASP scales sequence length up to 4096K using 128 A100 80G GPUs on 1B models, which is 8 times longer than existing SP methods while being significantly faster. The code is available at https://github.com/OpenNLPLab/LASP.

lasp, sequence, sequence length, (16 more...)

arXiv.org Artificial Intelligence

2404.02882

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Austria (0.04)
Asia > Middle East > UAE (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

ALS: Augmented Lagrangian Sketching Methods for Linear Systems

Morshed, Md Sarowar

arXiv.org Artificial IntelligenceAug-12-2022

We develop two fundamental stochastic sketching techniques; Penalty Sketching (PS) and Augmented Lagrangian Sketching (ALS) for solving consistent linear systems. The proposed PS and ALS techniques extend and generalize the scope of Sketch & Project (SP) method by introducing Lagrangian penalty sketches. In doing so, we recover SP methods as special cases and furthermore develop a family of new stochastic iterative methods. By varying sketch parameters in the proposed PS method, we recover novel stochastic methods such as Penalty Newton Descent, Penalty Kaczmarz, Penalty Stochastic Descent, Penalty Coordinate Descent, Penalty Gaussian Pursuit, and Penalty Block Kaczmarz. Furthermore, the proposed ALS method synthesizes a wide variety of new stochastic methods such as Augmented Newton Descent, Augmented Kaczmarz, Augmented Stochastic Descent, Augmented Coordinate Descent, Augmented Gaussian Pursuit, and Augmented Block Kaczmarz into one framework. Moreover, we show that the developed PS and ALS frameworks can be used to reformulate the original linear system into equivalent stochastic optimization problems namely the Penalty Stochastic Reformulation and Augmented Stochastic Reformulation. We prove global convergence rates for the PS and ALS methods as well as sub-linear $\mathcal{O}(\frac{1}{k})$ rates for the Cesaro average of iterates. The proposed convergence results hold for a wide family of distributions of random matrices, which provides the opportunity of fine-tuning the randomness of the method suitable for specific applications. Finally, we perform computational experiments that demonstrate the efficiency of our methods compared to the existing SP methods.

ab 1, algorithm, sp method, (15 more...)

arXiv.org Artificial Intelligence

2208.06152

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback