AITopics | accumulator

Collaborating Authors

accumulator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ad71c82b22f4f65b9398f76d8be4c615-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 19:45:48 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unified Unbiased Variance Estimation for MMD: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses

Zhong, Shijie, Fu, Jiangfeng, Yang, Yikun

arXiv.org Machine LearningJan-21-2026

The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from $\mathcal O(n^2)$ to $\mathcal O(n \log n)$.

artificial intelligence, estimator, machine learning, (16 more...)

arXiv.org Machine Learning

2601.13874

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling

Ye, Jin, Wang, Lingmei, Zhang, Shujian, Wu, Haihang

arXiv.org Artificial IntelligenceNov-27-2025

With the global energy transition and rapid development of renewable energy, the scheduling optimization challenge for combined power-heat systems under new energy integration and multiple uncertainties has become increasingly prominent. Addressing this challenge, this study proposes an intelligent scheduling method based on the improved Dual-Delay Deep Deterministic Policy Gradient (PVTD3) algorithm. System optimization is achieved by introducing a penalty term for grid power purchase variations. Simulation results demonstrate that under three typical scenarios (10%, 20%, and 30% renewable penetration), the PVTD3 algorithm reduces the system's comprehensive cost by 6.93%, 12.68%, and 13.59% respectively compared to the traditional TD3 algorithm. Concurrently, it reduces the average fluctuation amplitude of grid power purchases by 12.8%. Regarding energy storage management, the PVTD3 algorithm reduces the end-time state values of low-temperature thermal storage tanks by 7.67-17.67 units while maintaining high-temperature tanks within the 3.59-4.25 safety operating range. Multi-scenario comparative validation demonstrates that the proposed algorithm not only excels in economic efficiency and grid stability but also exhibits superior sustainable scheduling capabilities in energy storage device management.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.1525

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Wind (0.85)
Energy > Renewable > Solar (0.70)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Library Liberation: Competitive Performance Matmul Through Compiler-composed Nanokernels

Thangamani, Arun, Shahid, Md Asghar Ahmad, Siemieniuk, Adam, Morel, Rolf, Golin, Renato, Heinecke, Alexander

arXiv.org Artificial IntelligenceNov-19-2025

The rapidly evolving landscape of AI and machine learning workloads has widened the gap between high-level domain operations and efficient hardware utilization. Achieving near-peak performance still demands deep hardware expertise-experts either handcraft target-specific kernels (e.g., DeepSeek) or rely on specialized libraries (e.g., CUTLASS)-both of which add complexity and limit scalability for most ML practitioners. This paper introduces a compilation scheme that automatically generates scalable, high-performance microkernels by leveraging the MLIR dialects to bridge domain-level operations and processor capabilities. Our approach removes dependence on low-level libraries by enabling the compiler to auto-generate near-optimal code directly. At its core is a mechanism for composing nanokernels from low-level IR constructs with near-optimal register utilization, forming efficient microkernels tailored to each target. We implement this technique in an MLIR-based compiler supporting both vector and tile based CPU instructions. Experiments show that the generated nanokernels are of production-quality, and competitive with state-of-the-art microkernel libraries.

artificial intelligence, machine learning, vector, (17 more...)

arXiv.org Artificial Intelligence

2511.13764

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PyLO: Towards Accessible Learned Optimizers in PyTorch

Janson, Paul, Therien, Benjamin, Anthony, Quentin, Huang, Xiaolong, Moudgil, Abhinav, Belilovsky, Eugene

arXiv.org Artificial IntelligenceNov-11-2025

Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optimizers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups -- from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.10315

Country:

North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Communication Efficient LLM Pre-training with SparseLoCo

Sarfi, Amir, Thérien, Benjamin, Lidin, Joel, Belilovsky, Eugene

arXiv.org Artificial IntelligenceNov-7-2025

Communication-efficient distributed training algorithms have received considerable interest recently due to their benefits for training Large Language Models (LLMs) in bandwidth-constrained settings, such as across datacenters and over the internet. Despite reducing communication frequency, these methods still typically require communicating a full copy of the model's gradients-resulting in a communication bottleneck even for cross-datacenter links. Furthermore, they can slightly degrade performance compared to a naive AdamW DDP baseline. While quantization is often applied to reduce the pseudo-gradient's size, in the context of LLM pre-training, existing approaches have been unable to additionally leverage sparsification and have obtained limited quantization. In this work, we introduce SparseLoCo, a communication-efficient training algorithm for LLMs that effectively leverages error feedback with Top-k sparsification and 2-bit quantization to reach extreme sparsity as low as 1-3% while outperforming full-precision DiLoCo. Our key observations are that outer momentum can be locally approximated by an error feedback accumulator combined with aggressive sparsity, and that sparse aggregation can actually improve model performance. We empirically demonstrate in a range of communication-constrained LLM training settings that SparseLoCo provides significant benefits in both performance and communication cost.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.15706

Country:

North America > United States (0.46)
Europe > Italy (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators

Hong, Charles, Bhatia, Sahil, Cheung, Alvin, Shao, Yakun Sophia

arXiv.org Artificial IntelligenceNov-7-2025

Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming these tensor accelerators remains challenging, leaving much of their potential underutilized. Recently, large language models (LLMs), trained on large amounts of code, have shown significant promise in code generation and optimization tasks, but generating low-resource languages, such as specialized tensor accelerator code still poses a significant challenge. We tackle this challenge with Autocomp, an approach that empowers accelerator programmers to leverage domain knowledge and hardware feedback to optimize code via an automated LLM-driven search. We accomplish this by: 1) formulating each optimization pass as a structured two-phase prompt, divided into planning and code generation phases, 2) inserting domain knowledge during planning via a concise and adaptable optimization menu, and 3) integrating correctness and performance metrics from hardware as feedback at each search iteration. Across three distinct hardware platforms, we demonstrate that Autocomp-optimized code runs 5.6x faster than the vendor-provided library (Gemmini), outperforms expert-level hand-tuned code by 1.9x (AWS Trainium), and achieves 3.8x higher performance than a machine learning-based cost model for GPUs (NVIDIA L40S). Additionally, we demonstrate that optimization schedules generated from Autocomp can be reused across similar tensor operations, improving speedups by up to 24% under a fixed sample budget.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2505.18574

Country: North America (0.46)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mint: A Simple Test-Time Adaptation of Vision-Language Models against Common Corruptions

Bao, Wenxuan, Deng, Ruxi, He, Jingrui

arXiv.org Artificial IntelligenceOct-28-2025

Pretrained vision-language models such as CLIP achieve strong zero-shot generalization but remain vulnerable to distribution shifts caused by input corruptions. In this work, we investigate how corruptions affect CLIP's image embeddings and uncover a consistent phenomenon we term as embedding variance collapse, where both intra-class and inter-class variances shrink as corruption severity increases. We find that this collapse is closely tied to performance degradation, with inter-class variance strongly correlated with classification accuracy. To explain this phenomenon, we analyze how corruptions alter the structure of the embedding space. Our theoretical results suggest that the visual encoder tends to encode corruption-related signals, which dilute class-discriminative features and compress the representation geometry. We further show that maximizing inter-class variance, even when estimated from pseudo-labels, can provably enhance embedding quality. Based on this insight, we propose Mint, a simple test-time adaptation method that maximizes pseudo-label-based inter-class variance on the fly using a mean accumulator and a gradient accumulator. Mint operates effectively with small batch sizes and consistently improves performance across multiple corruption benchmarks and CLIP architectures. Our code is available at https://github.com/baowenxuan/Mint .

large language model, machine learning, variance, (21 more...)

arXiv.org Artificial Intelligence

2510.22127

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Rescaling-Aware Training for Efficient Deployment of Deep Learning Models on Full-Integer Hardware

Mueller, Lion, Garcia-Ortiz, Alberto, Najafi, Ardalan, Fuks, Adam, Bamberg, Lennart

arXiv.org Artificial IntelligenceOct-14-2025

Integer AI inference significantly reduces computational complexity in embedded systems. Quantization-aware training (QAT) helps mitigate accuracy degradation associated with post-training quantization but still overlooks the impact of integer rescaling during inference, which is a hardware costly operation in integer-only AI inference. This work shows that rescaling cost can be dramatically reduced post-training, by applying a stronger quantization to the rescale multiplicands at no model-quality loss. Furthermore, we introduce Rescale-Aware Training, a fine tuning method for ultra-low bit-width rescaling multiplicands. Experiments show that even with 8x reduced rescaler widths, the full accuracy is preserved through minimal incremental retraining. This enables more energy-efficient and cost-efficient AI inference for resource-constrained embedded systems.

artificial intelligence, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2510.11484

Country: Europe (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Microscaling Floating Point Formats for Large Language Models

Cococcioni, Marco, Pagani, Dario, Rossi, Federico

arXiv.org Artificial IntelligenceOct-3-2025

This paper leverages microscaling floating-point formats, a novel technique designed to address these challenges by reducing the storage and computational overhead associated with numerical representations in LLMs. Unlike traditional floating-point representations that allocate a dedicated scale for each value, microscaling employs a shared scale across a block of values, enabling compact one-byte floating-point representations while maintaining an extended dynamic range. We explore the application of microscaling in the context of 8-bit floating-point formats to significantly reduce memory footprint and computational costs. We tested several configurations of microscaling floats within the GPT -2 LLM architecture, demonstrating that microscaling data formats can achieve competitive accuracy during training and inference, proving its efficacy as a resource-efficient alternative for deploying LLMs at scale. The source code is publicly available at: https://github.com/

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.01863

Country: Europe (0.46)

Genre: Research Report > Promising Solution (0.66)

Industry: Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback