AITopics | high performance computing

Collaborating Authors

high performance computing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c383e44d9a878d1982d9abb838bd5d8a-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 19:16:15 GMT

arxiv preprint arxiv, pattern recognition, preprint arxiv, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Sparse Computations in Deep Learning Inference

Tasou, Ioanna, Mpakos, Panagiotis, Vlachos, Angelos, Adamopoulos, Dionysios, Giannakopoulos, Georgios, Katsikopoulos, Konstantinos, Karaparisis, Ioannis, Lazou, Maria, Loukovitis, Spyridon, Mei, Areti, Poulopoulou, Anastasia, Dimitriou, Angeliki, Filandrianos, Giorgos, Galanopoulos, Dimitrios, Karampinis, Vasileios, Mitsouras, Ilias, Spanos, Nikolaos, Anastasiadis, Petros, Doudalis, Ioannis, Nikas, Konstantinos, Retsinas, George, Tzouveli, Paraskevi, Giannoula, Christina, Koziris, Nectarios, Papadopoulou, Nikela, Stamou, Giorgos, Voulodimos, Athanasios, Goumas, Georgios

arXiv.org Artificial IntelligenceDec-3-2025

The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and environmental footprints. Sparsity stands out as a critical mechanism for drastically reducing these resource demands. However, its potential remains largely untapped and is not yet fully incorporated in production AI systems. To bridge this gap, this work provides the necessary knowledge and insights for performance engineers keen to get involved in deep learning inference optimization. In particular, in this work we: a) discuss the various forms of sparsity that can be utilized in DNN inference, b) explain how the original dense computations translate to sparse kernels, c) provide an extensive bibliographic review of the state-of-the-art in the implementation of these kernels for CPUs and GPUs, d) discuss the availability of sparse datasets in support of sparsity-related research and development, e) explore the current software tools and frameworks that provide robust sparsity support, and f) present evaluation results of different implementations of the key SpMM and SDDMM kernels on CPU and GPU platforms. Ultimately, this paper aims to serve as a resource for performance engineers seeking to develop and deploy highly efficient sparse deep learning models in productions.

artificial intelligence, machine learning, neural information processing system, (16 more...)

arXiv.org Artificial Intelligence

2512.0255

Country:

Europe (1.00)
North America > United States (0.67)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (1.00)
Energy (0.92)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces

Lahiry, Ankur, Pokharel, Ayush, Banday, Banooqa, Ockerman, Seth, Gueroudji, Amal, Zaeed, Mohammad, Islam, Tanzima Z., Pouchard, Line

arXiv.org Artificial IntelligenceOct-22-2025

Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance analysis both computationally expensive and time-consuming. To address this challenge, we present an end-to-end parallel performance analysis framework designed to handle multiple large-scale GPU traces efficiently. Our proposed framework partitions and processes trace data concurrently and employs causal graph methods and parallel coordinating chart to expose performance variability and dependencies across execution flows. Experimental results demonstrate a 67% improvement in terms of scalability, highlighting the effectiveness of our pipeline for analyzing multiple traces independently.

artificial intelligence, machine learning, variability, (15 more...)

arXiv.org Artificial Intelligence

2510.183

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (0.85)

Industry:

Energy (0.69)
Government > Regional Government (0.47)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Hardware (0.98)
Information Technology > Graphics (0.88)

Add feedback

c383e44d9a878d1982d9abb838bd5d8a-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 17:31:27 GMT

artificial intelligence, machine learning, preprint arxiv, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Efficient Large Language Models: A Survey

Wan, Zhongwei, Wang, Xin, Liu, Che, Alam, Samiul, Zheng, Yu, Liu, Jiachen, Qu, Zhongnan, Yan, Shen, Zhu, Yi, Zhang, Quanlu, Chowdhury, Mosharaf, Zhang, Mi

arXiv.org Artificial IntelligenceJan-31-2024

Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges.In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we compile the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey, and will actively maintain this repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.

activation outlier, efficient inference, operating system design and implementation, (13 more...)

arXiv.org Artificial Intelligence

2312.03863

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(25 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Education (0.46)
Information Technology (0.46)
Health & Medicine (0.45)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Model Performance Prediction for Hyperparameter Optimization of Deep Learning Models Using High Performance Computing and Quantum Annealing

Amboage, Juan Pablo García, Wulff, Eric, Girone, Maria, Pena, Tomás F.

arXiv.org Artificial IntelligenceNov-29-2023

Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a compute resource intensive process as it usually requires to train the target model with many different hyperparameter configurations. We show that integrating model performance prediction with early stopping methods holds great potential to speed up the HPO process of deep learning models. Moreover, we propose a novel algorithm called Swift-Hyperband that can use either classical or quantum support vector regression for performance prediction and benefit from distributed High Performance Computing environments. This algorithm is tested not only for the Machine-Learned Particle Flow model used in High Energy Physics, but also for a wider range of target models from domains such as computer vision and natural language processing. Swift-Hyperband is shown to find comparable (or better) hyperparameters as well as using less computational resources in all test cases.

algorithm, performance predictor, swift-hyperband, (12 more...)

arXiv.org Artificial Intelligence

2311.17508

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.05)
North America > Cuba (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design

Dai, Manna, Jiang, Yang, Yang, Feng, Chattoraj, Joyjit, Xia, Yingzhi, Xu, Xinxing, Zhao, Weijiang, Dao, My Ha, Liu, Yong

arXiv.org Artificial IntelligenceOct-18-2023

Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that deep learning has great potential to accelerate and refine metasurface designs. Here, we present XGAN, an extended generative adversarial network (GAN) with a surrogate for high-quality free-form metasurface designs. The proposed surrogate provides a physical constraint to XGAN so that XGAN can accurately generate metasurfaces monolithically from input spectral responses. In comparative experiments involving 20000 free-form metasurface designs, XGAN achieves 0.9734 average accuracy and is 500 times faster than the conventional methodology. This method facilitates the metasurface library building for specific spectral responses and can be extended to various inverse design problems, including optical metamaterials, nanophotonic devices, and drug discovery.

inverse design, metasurface, metasurface design, (15 more...)

arXiv.org Artificial Intelligence

2401.02961

Country:

Asia > Singapore (0.06)
Oceania > Australia (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-GPU Approach for Training of Graph ML Models on large CFD Meshes

Strönisch, Sebastian, Sander, Maximilian, Knüpfer, Andreas, Meyer, Marcus

arXiv.org Artificial IntelligenceJul-25-2023

Mesh-based numerical solvers are an important part in many design tool chains. However, accurate simulations like computational fluid dynamics are time and resource consuming which is why surrogate models are employed to speed-up the solution process. Machine Learning based surrogate models on the other hand are fast in predicting approximate solutions but often lack accuracy. Thus, the development of the predictor in a predictor-corrector approach is the focus here, where the surrogate model predicts a flow field and the numerical solver corrects it. This paper scales a state-of-the-art surrogate model from the domain of graph-based machine learning to industry-relevant mesh sizes of a numerical flow simulation. The approach partitions and distributes the flow domain to multiple GPUs and provides halo exchange between these partitions during training. The utilized graph neural network operates directly on the numerical mesh and is able to preserve complex geometries as well as all other properties of the mesh. The proposed surrogate model is evaluated with an application on a three dimensional turbomachinery setup and compared to a traditionally trained distributed model. The results show that the traditional approach produces superior predictions and outperforms the proposed surrogate model. Possible explanations, improvements and future directions are outlined.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.2514/6.2023-1203

2307.13592

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Germany > Saxony > Dresden (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads

Caspart, René, Ziegler, Sebastian, Weyrauch, Arvid, Obermaier, Holger, Raffeiner, Simon, Schuhmacher, Leon Pascal, Scholtyssek, Jan, Trofimova, Darya, Nolden, Marco, Reinartz, Ines, Isensee, Fabian, Götz, Markus, Debus, Charlotte

arXiv.org Artificial IntelligenceDec-3-2022

With the rise of artificial intelligence (AI) in recent years and the subsequent increase in complexity of the applied models, the growing demand in computational resources is starting to pose a significant challenge. The need for higher compute power is being met with increasingly more potent accelerator hardware as well as the use of large and powerful compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems ultimately comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, awareness of energy efficiency plays an important role for AI model developers and hardware infrastructure operators likewise. The energy consumption of AI workloads depends both on the model implementation and the composition of the utilized hardware. Therefore, accurate measurements of the power draw of respective AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. Towards this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of heterogeneous compute nodes. Our results indicate that 1. contrary to common approaches, deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional inefficiency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately - while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is the fact that the information on energy consumption is available to all users of the supercomputer and not just those with administrator rights, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2212.01698

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Western Europe (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area (0.70)
Energy > Power Industry (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meet Colossal-AI Team at SC22 and Other 3 Renowned International Conferences

#artificialintelligenceAug-31-2022, 08:00:09 GMT

Recently, Colossal-AI Team, which developed a unified deep learning system for the big model era, has been accepted and invited to deliver keynote speeches at a series of notable international conferences including SuperComputing 2022 (SC22), Open Data Science Conference (ODSC), World Artificial Intelligence Conference (WAIC), and AWS Summit. In the event, Colossal-AI Team is going to share many up-to-date and amazing things and technologies of High Performance Computing (HPC) and Artificial Intelligence (AI) that will change the world. Follow us and stay tuned! SC (formerly Supercomputing), the International Conference for High Performance Computing, Networking, Storage and Analysis, is the annual conference established in 1988 by the Association for Computing Machinery and the IEEE Computer Society. SC brings together the world's top research institutions and companies in the computer industry to share about the cutting-edge developments and innovations in HPC, networking, storage and analysis that will unlock new solutions and change our world.

colossal-ai team, international conference, renowned international conference, (12 more...)

#artificialintelligence

Country:

North America > United States > California (0.07)
Asia > Singapore (0.06)
Europe (0.05)
Asia > China (0.05)

Genre: Personal > Honors (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback