AITopics | utilization

Collaborating Authors

utilization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ASPEN: Breaking Operator Barriers for Efficient Parallel Execution of Deep Neural Networks

Neural Information Processing SystemsFeb-17-2026, 10:07:04 GMT

ASPEN also achieves high resource utilization and memory reuse by letting each resource asynchronously traverse depthwise in the DNN graph to its full computing potential.

artificial intelligence, execution, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
Africa > Mali (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification

Neural Information Processing SystemsDec-26-2025, 21:28:53 GMT

This paper introduces the novel concept of few-shot weakly supervised learning for pathology Whole Slide Image (WSI) classification, denoted as FSWC. A solution is proposed based on prompt learning and the utilization of a large language model, GPT-4. Since a WSI is too large and needs to be divided into patches for processing, WSI classification is commonly approached as a Multiple Instance Learning (MIL) problem. In this context, each WSI is considered a bag, and the obtained patches are treated as instances. The objective of FSWC is to classify both bags and instances with only a limited number of labeled bags. Unlike conventional few-shot learning problems, FSWC poses additional challenges due to its weak bag labels within the MIL framework.

large language model, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Oncology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Neural Information Processing SystemsDec-26-2025, 10:49:47 GMT

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

Impact of Data-Oriented and Object-Oriented Design on Performance and Cache Utilization with Artificial Intelligence Algorithms in Multi-Threaded CPUs

Arantes, Gabriel M., Pinto, Richard F., Dalmazo, Bruno L., Borges, Eduardo N., Lucca, Giancarlo, de Mattos, Viviane L. D., Cardoso, Fabian C., Berri, Rafael A.

arXiv.org Artificial IntelligenceDec-10-2025

This study provides a comprehensive performance analysis of Data-Oriented Design (DOD) versus the traditional Object-Oriented Design (OOD), focusing on cache utilization and efficiency in multi-threaded environments. We developed and compared four distinct versions of the A* search algorithm: single-threaded OOD (ST -OOD), single-threaded DOD (ST -DOD), multi-threaded OOD (MT -OOD), and multi-threaded DOD (MT -DOD). The evaluation was based on metrics including execution time, memory usage, and CPU cache misses. In multi-threaded tests, the DOD implementation demonstrated considerable performance gains, with faster execution times and a lower number of raw system calls and cache misses. While OOD occasionally showed marginal advantages in memory usage or percentage-based cache miss rates, DOD's efficiency in data-intensive operations was more evident. Furthermore, our findings reveal that for a fine-grained task like the A* algorithm, the overhead associated with thread management led to single-threaded versions significantly outperforming their multi-threaded counterparts in both paradigms. We conclude that even when performance differences appear subtle in simple algorithms, the consistent advantages of DOD in critical metrics highlight its foundational architectural superiority, suggesting it is a more effective approach for maximizing hardware efficiency in complex, large-scale AI and parallel computing tasks.

cache miss, object-oriented architecture, programming language, (17 more...)

arXiv.org Artificial Intelligence

2512.07841

Country:

South America > Brazil > Rio Grande do Sul > Pelotas (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Government > Military (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)

Add feedback

GSPN-2: Efficient Parallel Sequence Modeling

Wang, Hongjun, Jiang, Yitong, McCarthy, Collin, Wehr, David, Ye, Hanrong, Li, Xinhao, Cheung, Ka Chun, Byeon, Wonmin, Gu, Jinwei, Chen, Ke, Han, Kai, Yin, Hongxu, Molchanov, Pavlo, Kautz, Jan, Liu, Sifei

arXiv.org Artificial IntelligenceDec-10-2025

Efficient vision transformer remains a bottleneck for high-resolution images and long-video related real-world applications. Generalized Spatial Propagation Network (GSPN) addresses this by replacing quadratic self-attention with a line-scan propagation scheme, bringing the cost close to linear in the number of rows or columns, while retaining accuracy. Despite this advancement, the existing GSPN implementation still suffers from (i) heavy overhead due to repeatedly launching GPU kernels, (ii) excessive data transfers from global GPU memory, and (iii) redundant computations caused by maintaining separate propagation weights for each channel. We introduce GSPN-2, a joint algorithm-system redesign. In particular, we eliminate thousands of micro-launches from the previous implementation into one single 2D kernel, explicitly pin one warp to each channel slice, and stage the previous column's activations in shared memory. On the model side, we introduce a compact channel propagation strategy that replaces per-channel matrices, trimming parameters, and align naturally with the affinity map used in transformer attention. Experiments demonstrate GSPN-2's effectiveness across image classification and text-to-image synthesis tasks, matching transformer-level accuracy with significantly lower computational cost. GSPN-2 establishes a new efficiency frontier for modeling global spatial context in vision applications through its unique combination of structured matrix transformations and GPU-optimized implementation. Project page: https://whj363636.github.io/GSPN2/

artificial intelligence, gspn-2, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.07884

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SystolicAttention: Fusing FlashAttention within a Single Systolic Array

Lin, Jiawei, Li, Yuanlong, Chen, Guokai, Bourgeat, Thomas

arXiv.org Artificial IntelligenceDec-9-2025

Transformer models rely heavily on the scaled dot-product attention (SDPA) operation, typically implemented as FlashAttention. Characterized by its frequent interleaving of matrix multiplications and softmax operations, FlashAttention fails to fully utilize the compute resources of modern systolic-array-based accelerators designed for consecutive and large matrix multiplications. To fully unleash the performance potential of systolic arrays for FlashAttention, we propose FSA, an enhanced systolic array architecture that runs the entire FlashAttention on the array without external vector units. Combined with SystolicAttention, an optimized kernel for FSA that achieves fine-grained and element-wise overlapping of FlashAttention operations, FSA maximizes array utilization while preserving the original floating-point operation order of FlashAttention. We implement FSA in synthesizable RTL and evaluate its performance against state-of-the-art systolic-array-based accelerators. Our results show that FSA achieves 1.77x and 4.83x higher attention FLOPs/s utilization compared to AWS Neuron-v2 and Google TPUv5e, respectively. We synthesize FSA in a 16 nm technology at 1.5 GHz, and results indicate only a 12% area overhead compared to a standard weight-stationary systolic array.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.11331

Country:

Europe > Switzerland > Vaud > Lausanne (0.76)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
(11 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Optimizing video analytics inference pipelines: a case study

Ghafouri, Saeid, Ding, Yuming, Chito, Katerine Diaz, del Rincón, Jesús Martinez, O'Connell, Niamh, Vandierendonck, Hans

arXiv.org Artificial IntelligenceDec-9-2025

Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.

artificial intelligence, cloud computing, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2512.07009

Country:

Europe > United Kingdom > Northern Ireland > County Down > Belfast (0.04)
Europe > United Kingdom > Northern Ireland > County Antrim > Belfast (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Food & Agriculture > Agriculture (0.94)

Technology:

Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
(3 more...)

Add feedback

From FLOPs to Footprints: The Resource Cost of Artificial Intelligence

Falk, Sophia, Corrêa, Nicholas Kluge, Luccioni, Sasha, Biber-Freudenberger, Lisa, van Wynsberghe, Aimee

arXiv.org Artificial IntelligenceDec-5-2025

As computational demands continue to rise, assessing the environmental footprint of AI requires moving beyond energy and water consumption to include the material demands of specialized hardware. This study quantifies the material footprint of AI training by linking computational workloads to physical hardware needs. The elemental composition of the Nvidia A100 SXM 40 GB graphics processing unit (GPU) was analyzed using inductively coupled plasma optical emission spectroscopy, which identified 32 elements. The results show that AI hardware consists of about 90% heavy metals and only trace amounts of precious metals. The elements copper, iron, tin, silicon, and nickel dominate the GPU composition by mass. In a multi-step methodology, we integrate these measurements with computational throughput per GPU across varying lifespans, accounting for the computational requirements of training specific AI models at different training efficiency regimes. Scenario-based analyses reveal that, depending on Model FLOPs Utilization (MFU) and hardware lifespan, training GPT-4 requires between 1,174 and 8,800 A100 GPUs, corresponding to the extraction and eventual disposal of up to 7 tons of toxic elements. Combined software and hardware optimization strategies can reduce material demands: increasing MFU from 20% to 60% lowers GPU requirements by 67%, while extending lifespan from 1 to 3 years yields comparable savings; implementing both measures together reduces GPU needs by up to 93%. Our findings highlight that incremental performance gains, such as those observed between GPT-3.5 and GPT-4, come at disproportionately high material costs. The study underscores the necessity of incorporating material resource considerations into discussions of AI scalability, emphasizing that future progress in AI must align with principles of resource efficiency and environmental responsibility.

accessed, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.04142

Country:

Europe > Germany (0.04)
Africa > Sub-Saharan Africa (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Water & Waste Management > Water Management (1.00)
Materials > Metals & Mining (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

KAN-SAs: Efficient Acceleration of Kolmogorov-Arnold Networks on Systolic Arrays

Errabii, Sohaib, Sentieys, Olivier, Traiola, Marcello

arXiv.org Artificial IntelligenceDec-2-2025

Kolmogorov-Arnold Networks (KANs) have garnered significant attention for their promise of improved parameter efficiency and explainability compared to traditional Deep Neural Networks (DNNs). KANs' key innovation lies in the use of learnable non-linear activation functions, which are parametrized as splines. Splines are expressed as a linear combination of basis functions (B-splines). B-splines prove particularly challenging to accelerate due to their recursive definition. Systolic Array (SA)based architectures have shown great promise as DNN accelerators thanks to their energy efficiency and low latency. However, their suitability and efficiency in accelerating KANs have never been assessed. Thus, in this work, we explore the use of SA architecture to accelerate the KAN inference. We show that, while SAs can be used to accelerate part of the KAN inference, their utilization can be reduced to 30%. Hence, we propose KAN-SAs, a novel SA-based accelerator that leverages intrinsic properties of B-splines to enable efficient KAN inference. By including a nonrecursive B-spline implementation and leveraging the intrinsic KAN sparsity, KAN-SAs enhances conventional SAs, enabling efficient KAN inference, in addition to conventional DNNs. KAN-SAs achieves up to 100% SA utilization and up to 50% clock cycles reduction compared to conventional SAs of equivalent area, as shown by hardware synthesis results on a 28nm FD-SOI technology. We also evaluate different configurations of the accelerator on various KAN applications, confirming the improved efficiency of KAN inference provided by KAN-SAs.

artificial intelligence, b-spline, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.00055

Country:

Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Constrained Network Slice Assignment via Large Language Models

Sudhakara, Sagar, Rajak, Pankaj

arXiv.org Artificial IntelligenceDec-2-2025

Modern networks support network slicing, which partitions physical infrastructure into virtual slices tailored to different service requirements (for example, high bandwidth or low latency). Optimally allocating users to slices is a constrained optimization problem that traditionally requires complex algorithms. In this paper, we explore the use of Large Language Models (LLMs) to tackle radio resource allocation for network slicing. We focus on two approaches: (1) using an LLM in a zero-shot setting to directly assign user service requests to slices, and (2) formulating an integer programming model where the LLM provides semantic insight by estimating similarity between requests. Our experiments show that an LLM, even with zero-shot prompting, can produce a reasonable first draft of slice assignments, although it may violate some capacity or latency constraints. We then incorporate the LLM's understanding of service requirements into an optimization solver to generate an improved allocation. The results demonstrate that LLM-guided grouping of requests, based on minimal textual input, achieves performance comparable to traditional methods that use detailed numerical data, in terms of resource utilization and slice isolation. While the LLM alone does not perfectly satisfy all constraints, it significantly reduces the search space and, when combined with exact solvers, provides a promising approach for efficient 5G network slicing resource allocation.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.0004

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Telecommunications (0.67)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback