AITopics | parallelization

Collaborating Authors

parallelization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Massively Parallel Exact Inference for Hawkes Processes

Raza, Ahmer, Smith, Hudson

arXiv.org Machine LearningApr-10-2026

Multivariate Hawkes processes are a widely used class of self-exciting point processes, but maximum likelihood estimation naively scales as $O(N^2)$ in the number of events. The canonical linear exponential Hawkes process admits a faster $O(N)$ recurrence, but prior work evaluates this recurrence sequentially, without exploiting parallelization on modern GPUs. We show that the Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a simple yet massively parallelizable algorithm for maximum likelihood estimation of linear exponential Hawkes processes. Our method reduces the computational complexity to approximately $O(N/P)$ with $P$ parallel processors, and naturally yields a batching scheme to maintain constant memory usage, avoiding GPU memory constraints. Importantly, it computes the exact likelihood without any additional assumptions or approximations, preserving the simplicity and interpretability of the model. We demonstrate orders-of-magnitude speedups on simulated and real datasets, scaling to thousands of nodes and tens of millions of events, substantially beyond scales reported in prior work. We provide an open-source PyTorch library implementing our optimizations.

artificial intelligence, hawke process, machine learning, (17 more...)

arXiv.org Machine Learning

2604.01342

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.96)

Add feedback

5bf8aaef51c6e0d363cbe554acaf3f20-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 13:27:06 GMT

computation, dataset, secret share, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Riverside County > Riverside (0.04)
North America > Canada (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Spain > Canary Islands (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.31)

Add feedback

Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices

Neural Information Processing SystemsDec-25-2025, 14:28:07 GMT

Recurrent Neural Networks (RNNs) capture long dependencies and context, and 2 hence are the key component of typical sequential data based tasks. However, the sequential nature of RNNs dictates a large inference cost for long sequences even if the hardware supports parallelization. To induce long-term dependencies, and yet admit parallelization, we introduce novel shallow RNNs. In this architecture, the first layer splits the input sequence and runs several independent RNNs.

accurate time-series classification, resource constrained device, shallow rnn, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

Add feedback

Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions

Xiao, Junlin, Darvariu, Victor-Alexandru, Lacerda, Bruno, Hawes, Nick

arXiv.org Artificial IntelligenceDec-11-2025

Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In environments with continuous action spaces, how to best aggregate statistics from different threads is an important yet underexplored question. In this work, we introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment. We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies while requiring a modest increase in inference time.

action space, artificial intelligence, planning & scheduling, (16 more...)

arXiv.org Artificial Intelligence

2512.09727

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

Lian, Long, Wang, Sida, Juefei-Xu, Felix, Fu, Tsu-Jui, Li, Xiuyu, Yala, Adam, Darrell, Trevor, Suhr, Alane, Tian, Yuandong, Lin, Xi Victoria

arXiv.org Artificial IntelligenceDec-10-2025

Scaling inference-time computation has enabled Large Language Models (LLMs) to achieve strong reasoning performance, but inherently sequential decoding leads to substantial latency, especially on complex tasks. Recent work on adaptive parallel reasoning aims to improve inference efficiency by decomposing the problem-solving process into concurrent reasoning threads when beneficial. However, existing methods on realistic tasks are either limited to supervised behavior cloning or exhibit significant accuracy drops compared to widely-used sequential long chain-of-thought (CoT) baselines. Moreover, many require customized inference engines, complicating deployment. We introduce ThreadWeaver, a framework for adaptive parallel reasoning that achieves accuracy on par with popular sequential reasoning models of comparable size while significantly reducing inference latency. ThreadWeaver's performance stems from three key innovations: 1) a two-stage parallel trajectory generator that produces large-scale, high-quality CoT data with parallel annotations for supervised fine-tuning; 2) a trie-based training-inference co-design that enables parallel reasoning on any off-the-shelf autoregressive inference engine without modifying position embeddings or KV caches; and 3) a parallelization-aware reinforcement learning framework that teaches the model to balance accuracy with effective parallelization. Across six challenging mathematical reasoning benchmarks, ThreadWeaver trained atop Qwen3-8B achieves accuracy comparable to cutting-edge sequential reasoning models (71.9% on average and 79.9% on AIME24) while delivering up to 1.53x average speedup in token latency, establishing a new Pareto frontier between accuracy and efficiency.

large language model, machine learning, trajectory, (20 more...)

arXiv.org Artificial Intelligence

2512.07843

Genre:

Workflow (0.95)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

GANGR: GAN-Assisted Scalable and Efficient Global Routing Parallelization

Jooshin, Hadi Khodaei, Partin-Vaisband, Inna

arXiv.org Artificial IntelligenceNov-25-2025

Abstract--Global routing is a critical stage in electronic design automation (EDA) that enables early estimation and optimization of the routability of modern integrated circuits with respect to congestion, power dissipation, and design complexity. Batching is a primary concern in top-performing global routers, grouping nets into manageable sets to enable parallel processing and efficient resource usage. This process improves memory usage, scalable parallelization on modern hardware, and routing congestion by controlling net interactions within each batch. However, conventional batching methods typically depend on heuristics that are computationally expensive and can lead to suboptimal results (oversized batches with conflicting nets, excessive batch counts degrading parallelization, and longer batch generation times), ultimately limiting scalability and efficiency. T o address these limitations, a novel batching algorithm enhanced with Wasserstein generative adversarial networks (WGANs) is introduced in this paper, enabling more effective parallelization by generating fewer higher-quality batches in less time. The proposed algorithm is tested on the latest ISPD'24 contest benchmarks, demonstrating up to 40% runtime reduction with only 0.002% degradation in routing quality as compared to state-of-the-art router .

artificial intelligence, batch, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2511.17665

Country: North America > United States > Illinois (0.15)

Genre: Research Report (0.50)

Industry: Semiconductors & Electronics (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices

Don Dennis, Durmus Alp Emre Acar, Vikram Mandikal, Vinu Sankar Sadasivan, Venkatesh Saligrama, Harsha Vardhan Simhadri, Prateek Jain

Neural Information Processing SystemsNov-17-2025, 07:42:25 GMT

Work done as a Research Fellow at Microsoft Research India.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada (0.04)
Asia > India > Gujarat > Gandhinagar (0.04)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scalable Coverage Trajectory Synthesis on GPUs as Statistical Inference

Sun, Max M., Kwon, Jueun, Murphey, Todd

arXiv.org Artificial IntelligenceNov-17-2025

Abstract--Coverage motion planning is essential to a wide range of robotic tasks. Unlike conventional motion planning problems, which reason over temporal sequences of states, coverage motion planning requires reasoning over the spatial distribution of entire trajectories, making standard motion planning methods limited in computational efficiency and less amenable to modern parallelization frameworks. In this work, we formulate the coverage motion planning problem as a statistical inference problem from the perspective of flow matching, a generative modeling technique that has gained significant attention in recent years. The proposed formulation unifies commonly used statistical discrepancy measures, such as Kullback-Leibler divergence and Sinkhorn divergence, with a standard linear quadratic regulator problem. This paper focuses on the advantages of this formulation in terms of scalability through parallelization, highlighting its computational benefits compared to conventional methods based on waypoint tracking.

artificial intelligence, planning & scheduling, trajectory, (13 more...)

arXiv.org Artificial Intelligence

2511.11514

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Filters

Collaborating Authors

parallelization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Massively Parallel Exact Inference for Hawkes Processes

6034a661584af6c28fd97a6f23e56c0a-Supplemental-Conference.pdf

5bf8aaef51c6e0d363cbe554acaf3f20-Paper.pdf

41e7637e7b6a9f27a98b84d3a185c7c0-AuthorFeedback.pdf

Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices

Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions

ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

GANGR: GAN-Assisted Scalable and Efficient Global Routing Parallelization

Shallow RNN: Accurate Time-series Classification on Resource Constrained Devices

Scalable Coverage Trajectory Synthesis on GPUs as Statistical Inference