AITopics | computational demand

Collaborating Authors

computational demand

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NormFit: A Lightweight Solution for Few-Shot Federated Learning with Non-IID Data

Neural Information Processing SystemsJun-14-2026, 07:31:05 GMT

Vision-Language Models (VLMs) have recently attracted considerable attention in Federated Learning (FL) due to their strong and robust performance. In particular, few-shot adaptation with pre-trained VLMs like CLIP enhances the performance of downstream tasks. However, existing methods still suffer from substantial communication overhead, high local computational demands, and suboptimal performance under non-IID user data. To simultaneously address all those limitations, we propose NormFit, a lightweight solution that selectively fine-tunes only a very small portion of the model parameters, specifically only the Pre-LayerNorm parameters of the vision encoder within a VLM. Overcoming the existing tradeoff between performance and communication/computation efficiency in few-shot FL, NormFit sets a new benchmark by simultaneously achieving superior accuracy and substantially reduced communication and computational demands. Theoretically, we show that NormFit yields a considerably smaller generalization gap compared to tuning all LayerNorm parameters. Importantly, NormFit can function effectively as a standalone solution or integrate seamlessly with existing few-shot fine-tuning methods to further enhance their performance. Notably, NormFit offers implementation simplicity, achieving these improvements without any algorithmic modifications, changes to the underlying model architecture, or the addition of external parameters.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

From FLOPs to Footprints: The Resource Cost of Artificial Intelligence

Falk, Sophia, Corrêa, Nicholas Kluge, Luccioni, Sasha, Biber-Freudenberger, Lisa, van Wynsberghe, Aimee

arXiv.org Artificial IntelligenceDec-5-2025

As computational demands continue to rise, assessing the environmental footprint of AI requires moving beyond energy and water consumption to include the material demands of specialized hardware. This study quantifies the material footprint of AI training by linking computational workloads to physical hardware needs. The elemental composition of the Nvidia A100 SXM 40 GB graphics processing unit (GPU) was analyzed using inductively coupled plasma optical emission spectroscopy, which identified 32 elements. The results show that AI hardware consists of about 90% heavy metals and only trace amounts of precious metals. The elements copper, iron, tin, silicon, and nickel dominate the GPU composition by mass. In a multi-step methodology, we integrate these measurements with computational throughput per GPU across varying lifespans, accounting for the computational requirements of training specific AI models at different training efficiency regimes. Scenario-based analyses reveal that, depending on Model FLOPs Utilization (MFU) and hardware lifespan, training GPT-4 requires between 1,174 and 8,800 A100 GPUs, corresponding to the extraction and eventual disposal of up to 7 tons of toxic elements. Combined software and hardware optimization strategies can reduce material demands: increasing MFU from 20% to 60% lowers GPU requirements by 67%, while extending lifespan from 1 to 3 years yields comparable savings; implementing both measures together reduces GPU needs by up to 93%. Our findings highlight that incremental performance gains, such as those observed between GPT-3.5 and GPT-4, come at disproportionately high material costs. The study underscores the necessity of incorporating material resource considerations into discussions of AI scalability, emphasizing that future progress in AI must align with principles of resource efficiency and environmental responsibility.

accessed, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.04142

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Water & Waste Management > Water Management (1.00)
Materials > Metals & Mining (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Layer-skipping in Pre-trained LLMs

Luo, Xuan, Wang, Weizhi, Yan, Xifeng

arXiv.org Artificial IntelligenceOct-10-2025

Various layer-skipping methods have been proposed to accelerate token generation in large language models (LLMs). However, limited attention has been paid to a fundamental question: How do computational demands vary across the generation of different tokens? In this work, we introduce FlexiDepth, a method that dynamically adjusts the number of Transformer layers used in text generation. By incorporating a plug-in router and adapter, FlexiDepth enables adaptive computation in LLMs without modifying their original parameters. Applied to Llama-3-8B, it skips 8 out of 32 layers while maintaining full benchmark performance. Our experiments reveal that computational demands in LLMs significantly vary based on token type. Specifically, generating repetitive tokens or fixed phrases requires fewer layers, whereas producing tokens involving computation or high uncertainty requires more layers. Despite the computational savings, FlexiDepth does not yet achieve wall-clock speedup due to varied skipping patterns and I/O overhead. To inspire future work and advance research on practical speedup, we open-sourced FlexiDepth and a dataset documenting its layer allocation patterns.

flexidepth, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.23798

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Direct Image Classification from Fourier Ptychographic Microscopy Measurements without Reconstruction

Agarwal, Navya Sonal, Schneider, Jan Philipp, Gandikota, Kanchana Vaishnavi, Kazim, Syed Muhammad, Meshreki, John, Ihrke, Ivo, Moeller, Michael

arXiv.org Artificial IntelligenceAug-25-2025

The computational imaging technique of Fourier Ptychographic Microscopy (FPM) enables high-resolution imaging with a wide field of view and can serve as an extremely valuable tool, e.g. in the classification of cells in medical applications. However, reconstructing a high-resolution image from tens or even hundreds of measurements is computationally expensive, particularly for a wide field of view. Therefore, in this paper, we investigate the idea of classifying the image content in the FPM measurements directly without performing a reconstruction step first. We show that Convolutional Neural Networks (CNN) can extract meaningful information from measurement sequences, significantly outperforming the classification on a single band-limited image (up to 12 %) while being significantly more efficient than a reconstruction of a high-resolution image. Furthermore, we demonstrate that a learned multiplexing of several raw measurements allows maintaining the classification accuracy while reducing the amount of data (and consequently also the acquisition time) significantly.

accuracy, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2505.05054

Country: Europe > Germany (0.15)

Genre: Research Report (0.65)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Enhancing the Scalability of Classical Surrogates for Real-World Quantum Machine Learning Applications

Hernicht, Philip Anton, Sakhnenko, Alona, O'Meara, Corey, Cortiana, Giorgio, Lorenz, Jeanette Miriam

arXiv.org Artificial IntelligenceAug-11-2025

Quantum machine learning (QML) presents potential for early industrial adoption, yet limited access to quantum hardware remains a significant bottleneck for deployment of QML solutions. This work explores the use of classical surrogates to bypass this restriction, which is a technique that allows to build a lightweight classical representation of a (trained) quantum model, enabling to perform inference on entirely classical devices. We reveal prohibiting high computational demand associated with previously proposed methods for generating classical surrogates from quantum models, and propose an alternative pipeline enabling generation of classical surrogates at a larger scale than was previously possible. Previous methods required at least a high-performance computing (HPC) system for quantum models of below industrial scale (ca. 20 qubits), which raises questions about its practicality. We greatly minimize the redundancies of the previous approach, utilizing only a minute fraction of the resources previously needed. We demonstrate the effectiveness of our method on a real-world energy demand forecasting problem, conducting rigorous testing of performance and computation demand in both simulations and on quantum hardware. Our results indicate that our method achieves high accuracy on the testing dataset while its computational resource requirements scale linearly rather than exponentially. This work presents a lightweight approach to transform quantum solutions into classically deployable versions, facilitating faster integration of quantum technology in industrial settings. Furthermore, it can serve as a powerful research tool in search practical quantum advantage in an empirical setup.

data mining, machine learning, quantum model, (18 more...)

arXiv.org Artificial Intelligence

2508.06131

Country: Europe > Germany (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Energy (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CAS Condensed and Accelerated Silhouette: An Efficient Method for Determining the Optimal K in K-Means Clustering

Das, Krishnendu, Gupta, Sumit, Kumar, Awadhesh

arXiv.org Artificial IntelligenceJul-14-2025

--Clustering is a critical component of decision-making in today's data-driven environments. Clustering has been widely used in a variety of fields, such as bioinformatics, social network analysis, and image processing. However, clustering accuracy remains a major challenge in large datasets. This paper presents a comprehensive overview of strategies for selecting optimal k in clustering, with a focus on achieving a balance between clustering precision and computational efficiency in complex data environments. In addition, this paper introduces improvements to clustering techniques relating to text and image data to provide insights into better computational performance and cluster validity. The proposed approach is based on the Condensed Silhouette method, a statistical methods like Local Structures, Gap Statistics, Class-Consistency Ratio and Cluster Overlap Index(CCR-COI) based algorithm to calculate the best value of K for K-Means Clustering the data. The results of comparative experiments show that the proposed approach achieves up to 99% faster execution times on high-dimensional datasets while retaining both precision and scalability, making it highly suitable for real-time clustering needs or scenarios demanding efficient clustering with minimal resource utilization. Clustering is a critical component of unsupervised machine learning, with the K -means algorithm being particularly favored due to its straightforwardness, speed, and ability to be easily understood. Nonetheless, a major difficulty lies in accurately identifying the best number of clusters, K, especially with expansive and high-dimensional datasets where it is crucial to strike an effective balance between computational efficiency and accuracy.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.08311

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Realtime-Capable Hybrid Spiking Neural Networks for Neural Decoding of Cortical Activity

Krausse, Jann, Vasilache, Alexandru, Knobloch, Klaus, Becker, Juergen

arXiv.org Artificial IntelligenceJun-17-2025

Intra-cortical brain-machine interfaces (iBMIs) present a promising solution to restoring and decoding brain activity lost due to injury. However, patients with such neuroprosthetics suffer from permanent skull openings resulting from the devices' bulky wiring. This drives the development of wireless iBMIs, which demand low power consumption and small device footprint. Most recently, spiking neural networks (SNNs) have been researched as potential candidates for low-power neural decoding. In this work, we present the next step of utilizing SNNs for such tasks, building on the recently published results of the 2024 Grand Challenge on Neural Decoding Challenge for Motor Control of non-Human Primates. We optimize our model architecture to exceed the existing state of the art on the Primate Reaching dataset while maintaining similar resource demand through various compression techniques. We further focus on implementing a realtime-capable version of the model and discuss the implications of this architecture. With this, we advance one step towards latency-free decoding of cortical spike trains using neuromorphic technology, ultimately improving the lives of millions of paralyzed patients.

architecture, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2506.134

Country: Europe > Germany (0.47)

Genre: Research Report > Promising Solution (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Survey on Post-training of Large Language Models

Tie, Guiyao, Zhao, Zeli, Song, Dingjie, Wei, Fuyang, Zhou, Rong, Dai, Yurou, Yin, Wen, Yang, Zhejian, Yan, Jiangyue, Su, Yao, Dai, Zhenhan, Xie, Yifeng, Cao, Yihan, Sun, Lichao, Zhou, Pan, He, Lifang, Chen, Hechang, Zhang, Yu, Wen, Qingsong, Liu, Tianming, Gong, Neil Zhenqiang, Tang, Jiliang, Xiong, Caiming, Ji, Heng, Yu, Philip S., Gao, Jianfeng

arXiv.org Artificial IntelligenceMar-8-2025

The emergence of Large Language Models (LLMs) has fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. However, their pre-trained architectures often reveal limitations in specialized contexts, including restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. These challenges necessitate advanced post-training language models (PoLMs) to address these shortcomings, such as OpenAI-o1/o3 and DeepSeek-R1 (collectively known as Large Reasoning Models, or LRMs). This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Efficiency, which optimizes resource utilization amidst increasing complexity; and Integration and Adaptation, which extend capabilities across diverse modalities while addressing coherence issues. Charting progress from ChatGPT's foundational alignment strategies to DeepSeek-R1's innovative reasoning advancements, we illustrate how PoLMs leverage datasets to mitigate biases, deepen reasoning capabilities, and enhance domain adaptability. Our contributions include a pioneering synthesis of PoLM evolution, a structured taxonomy categorizing techniques and datasets, and a strategic agenda emphasizing the role of LRMs in improving reasoning proficiency and domain flexibility. As the first survey of its scope, this work consolidates recent PoLM advancements and establishes a rigorous intellectual framework for future research, fostering the development of LLMs that excel in precision, ethical robustness, and versatility across scientific and societal applications.

group relative policy optimization, point response positive ai reference, point response positive human reference, (12 more...)

arXiv.org Artificial Intelligence

2503.06072

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(8 more...)

Genre:

Workflow (1.00)
Overview (1.00)
Instructional Material (1.00)
Research Report > Promising Solution (0.92)

Industry:

Law (0.93)
Education > Educational Setting (0.67)
Leisure & Entertainment > Sports (0.45)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Dynamic Token Reduction during Generation for Vision Language Models

Liang, Xiaoyu, Guan, Chaofeng, Lu, Jiaying, Chen, Huiyao, Wang, Huan, Hu, Haoji

arXiv.org Artificial IntelligenceJan-23-2025

Vision-Language Models (VLMs) have achieved notable success in multimodal tasks but face practical limitations due to the quadratic complexity of decoder attention mechanisms and autoregressive generation. Existing methods like FASTV and VTW have achieved notable results in reducing redundant visual tokens, but these approaches focus on pruning tokens in a single forward pass without systematically analyzing the redundancy of visual tokens throughout the entire generation process. In this paper, we introduce a dynamic pruning strategy tailored for VLMs, namedDynamic Rate (DyRate), which progressively adjusts the compression rate during generation. Our analysis of the distribution of attention reveals that the importance of visual tokens decreases throughout the generation process, inspiring us to adopt a more aggressive compression rate. By integrating a lightweight predictor based on attention distribution, our approach enables flexible adjustment of pruning rates based on the attention distribution. Our experimental results demonstrate that our method not only reduces computational demands but also maintains the quality of responses.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.14204

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(7 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning

Puccioni, Laura, Farshin, Alireza, Scazzariello, Mariano, Wang, Changjie, Chiesa, Marco, Kostic, Dejan

arXiv.org Artificial IntelligenceJan-9-2025

Large Language Models (LLMs) have demonstrated their exceptional performance in various complex code generation tasks. However, their broader adoption is limited by significant computational demands and high resource requirements, particularly memory and processing power. To mitigate such requirements, model pruning techniques are used to create more compact models with significantly fewer parameters. However, current approaches do not focus on the efficient extraction of programming-language-specific sub-models. In this work, we explore the idea of efficiently deriving coding-specific sub-models through unstructured pruning (i.e., Wanda). We investigate the impact of different domain-specific calibration datasets on pruning outcomes across three distinct domains and extend our analysis to extracting four language-specific sub-models: Python, Java, C++, and JavaScript. We are the first to efficiently extract programming-language-specific sub-models using appropriate calibration datasets while maintaining acceptable accuracy w.r.t. full models. We are also the first to provide analytical evidence that domain-specific tasks activate distinct regions within LLMs, supporting the creation of specialized sub-models through unstructured pruning. We believe that this work has significant potential to enhance LLM accessibility for coding by reducing computational requirements to enable local execution on consumer-grade hardware, and supporting faster inference times critical for real-time development feedback.

arxiv, dataset, pruning, (13 more...)

arXiv.org Artificial Intelligence

2501.05248

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > Sweden (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback