AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-18-2026, 12:16:13 GMT

NeuralSteiner: Learning Steiner Tree for Overflow-avoiding Global Routing in Chip Design

Global routing plays a critical role in modern chip design. The routing paths generated by global routers often form a rectilinear Steiner tree (RST).

artificial intelligence, deep learning, machine learning, (19 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Semiconductors & Electronics (0.86)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Neural Information Processing SystemsFeb-10-2026, 11:13:28 GMT

CogView: MasteringText-to-ImageGenerationvia Transformers

Then in the stage 2, an auto-regressive model (such as PixelCNN [47]) learns to fit the prior of hidden variables.

artificial intelligence, deep learning, machine learning, (18 more...)

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Jiangsu Province > Changzhou (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsNov-21-2025, 12:06:11 GMT

Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks

Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Scott Gray, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, Naveen Rao

Despite advances in limited precision inference in recent years, training of neural networks in low bit-width remains a challenging problem.

exponent, neural network, tensor, (14 more...)

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 19:50:45 GMT

NeuralSteiner: Learning Steiner Tree for Overflow-avoiding Global Routing in Chip Design

Global routing plays a critical role in modern chip design. The routing paths generated by global routers often form a rectilinear Steiner tree (RST).

candidate point, neuralsteiner, overflow, (16 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Semiconductors & Electronics (0.86)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Xu, Xuefeng, Cormode, Graham

Power Transform Revisited: Numerically Stable, and Federated

arXiv.org Artificial IntelligenceOct-7-2025

Power transforms are popular parametric techniques for making data more Gaussian-like, and are widely used as preprocessing steps in statistical analysis and machine learning. However, we find that direct implementations of power transforms suffer from severe numerical instabilities, which can lead to incorrect results or even crashes. In this paper, we provide a comprehensive analysis of the sources of these instabilities and propose effective remedies. We further extend power transforms to the federated learning setting, addressing both numerical and distributional challenges that arise in this context. Experiments on real-world datasets demonstrate that our methods are both effective and robust, substantially improving stability compared to existing approaches.

artificial intelligence, computation, machine learning, (14 more...)

2510.04995

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceOct-7-2025

Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs

Kim, Bumjun, Jeon, Dongjae, Kim, Dueun, Jeung, Wonje, No, Albert

Diffusion large language models (dLLMs) have emerged as a promising alternative to autoregressive models, offering flexible generation orders and strong performance on complex reasoning tasks. However, instruction-tuned dLLMs exhibit a critical vulnerability we term \texttt{} overflow: as allocated sequence length increases, responses paradoxically become shorter, collapsing into early termination or degenerating into streams of \texttt{} tokens. Although noticed in practice, this issue has not been systematically analyzed. We trace its root cause to the dual role of \texttt{} as both termination and padding, which concentrates probability mass on \texttt{} at later positions and propagates backward to trigger early termination. To address this, we introduce Rainbow Padding, a simple remedy that replaces repeated \texttt{} placeholders with a repeating cycle of distinct padding tokens, distributing probability mass and breaking \texttt{} dominance. Experiments show that Rainbow Padding substantially improves length robustness and output quality, with as few as seven padding tokens sufficient to prevent early termination. Moreover, the method integrates efficiently into existing instruction-tuned models: LoRA fine-tuning for a single epoch on minimal data yields significant improvements, making this solution highly practical. The code is publicly available at https://github.com/quasar529/rainbow-padding.

large language model, natural language, rainbow padding, (14 more...)

2510.0368

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceSep-10-2025

Overflow Prevention Enhances Long-Context Recurrent LLMs

Ben-Kish, Assaf, Zimerman, Itamar, Mirza, M. Jehanzeb, Wolf, Lior, Glass, James, Karlinsky, Leonid, Giryes, Raja

A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency. We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance. Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized. Specifically, we demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portion of the input can mitigate recurrent memory failures and be effective for many long-context tasks: On LongBench, our method improves the overall performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Surprisingly, this simple approach also leads to state-of-the-art results in the challenging LongBench v2 benchmark, showing competitive performance with equivalent size Transformers. Furthermore, our findings raise questions about whether recurrent models genuinely exploit long-range dependencies, as our single-chunk strategy delivers stronger performance - even in tasks that presumably require cross-context relations.

large language model, machine learning, natural language, (18 more...)

2505.07793

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceApr-15-2025

MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation

Natesh, Vikas, Kung, H. T., Kong, David

We offer a novel approach, MGS (Markov Greedy Sums), to improve the accuracy of low-bitwidth floating-point dot products in neural network computations. In conventional 32-bit floating-point summation, adding values with different exponents may lead to loss of precision in the mantissa of the smaller term, which is right-shifted to align with the larger term's exponent. Such shifting (a.k.a. 'swamping') is a significant source of numerical errors in accumulation when implementing low-bitwidth dot products (e.g., 8-bit floating point) as the mantissa has a small number of bits. We avoid most swamping errors by arranging the terms in dot product summation based on their exponents and summing the mantissas without overflowing the low-bitwidth accumulator. We design, analyze, and implement the algorithm to minimize 8-bit floating point error at inference time for several neural networks. In contrast to traditional sequential summation, our method has significantly lowered numerical errors, achieving classification accuracy on par with high-precision floating-point baselines for multiple image classification tasks. Our dMAC hardware units can reduce power consumption by up to 34.1\% relative to conventional MAC units.

artificial intelligence, machine learning, overflow, (19 more...)

2504.09072

Country: North America > United States (0.94)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)