high performance computing
Sparse Computations in Deep Learning Inference
Tasou, Ioanna, Mpakos, Panagiotis, Vlachos, Angelos, Adamopoulos, Dionysios, Giannakopoulos, Georgios, Katsikopoulos, Konstantinos, Karaparisis, Ioannis, Lazou, Maria, Loukovitis, Spyridon, Mei, Areti, Poulopoulou, Anastasia, Dimitriou, Angeliki, Filandrianos, Giorgos, Galanopoulos, Dimitrios, Karampinis, Vasileios, Mitsouras, Ilias, Spanos, Nikolaos, Anastasiadis, Petros, Doudalis, Ioannis, Nikas, Konstantinos, Retsinas, George, Tzouveli, Paraskevi, Giannoula, Christina, Koziris, Nectarios, Papadopoulou, Nikela, Stamou, Giorgos, Voulodimos, Athanasios, Goumas, Georgios
The computational demands of modern Deep Neural Networks (DNNs) are immense and constantly growing. While training costs usually capture public attention, inference demands are also contributing in significant computational, energy and environmental footprints. Sparsity stands out as a critical mechanism for drastically reducing these resource demands. However, its potential remains largely untapped and is not yet fully incorporated in production AI systems. To bridge this gap, this work provides the necessary knowledge and insights for performance engineers keen to get involved in deep learning inference optimization. In particular, in this work we: a) discuss the various forms of sparsity that can be utilized in DNN inference, b) explain how the original dense computations translate to sparse kernels, c) provide an extensive bibliographic review of the state-of-the-art in the implementation of these kernels for CPUs and GPUs, d) discuss the availability of sparse datasets in support of sparsity-related research and development, e) explore the current software tools and frameworks that provide robust sparsity support, and f) present evaluation results of different implementations of the key SpMM and SDDMM kernels on CPU and GPU platforms. Ultimately, this paper aims to serve as a resource for performance engineers seeking to develop and deploy highly efficient sparse deep learning models in productions.
- Europe > Greece > Attica > Athens (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Middle East > Jordan (0.04)
- (10 more...)
- Overview (1.00)
- Research Report > New Finding (0.92)
- Information Technology (1.00)
- Energy (0.92)
- Health & Medicine (0.67)
A Distributed Framework for Causal Modeling of Performance Variability in GPU Traces
Lahiry, Ankur, Pokharel, Ayush, Banday, Banooqa, Ockerman, Seth, Gueroudji, Amal, Zaeed, Mohammad, Islam, Tanzima Z., Pouchard, Line
Large-scale GPU traces play a critical role in identifying performance bottlenecks within heterogeneous High-Performance Computing (HPC) architectures. However, the sheer volume and complexity of a single trace of data make performance analysis both computationally expensive and time-consuming. To address this challenge, we present an end-to-end parallel performance analysis framework designed to handle multiple large-scale GPU traces efficiently. Our proposed framework partitions and processes trace data concurrently and employs causal graph methods and parallel coordinating chart to expose performance variability and dependencies across execution flows. Experimental results demonstrate a 67% improvement in terms of scalability, highlighting the effectiveness of our pipeline for analyzing multiple traces independently.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Texas > Hays County > San Marcos (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (2 more...)
- Energy (0.69)
- Government > Regional Government (0.47)
- Information Technology > Scientific Computing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Hardware (0.98)
- Information Technology > Graphics (0.88)
Efficient Large Language Models: A Survey
Wan, Zhongwei, Wang, Xin, Liu, Che, Alam, Samiul, Zheng, Yu, Liu, Jiachen, Qu, Zhongnan, Yan, Shen, Zhu, Yi, Zhang, Quanlu, Chowdhury, Mosharaf, Zhang, Mi
Large Language Models (LLMs) have demonstrated remarkable capabilities in important tasks such as natural language understanding, language generation, and complex reasoning and have the potential to make a substantial impact on our society. Such capabilities, however, come with the considerable resources they demand, highlighting the strong need to develop effective techniques for addressing their efficiency challenges.In this survey, we provide a systematic and comprehensive review of efficient LLMs research. We organize the literature in a taxonomy consisting of three main categories, covering distinct yet interconnected efficient LLMs topics from model-centric, data-centric, and framework-centric perspective, respectively. We have also created a GitHub repository where we compile the papers featured in this survey at https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey, and will actively maintain this repository and incorporate new research as it emerges. We hope our survey can serve as a valuable resource to help researchers and practitioners gain a systematic understanding of the research developments in efficient LLMs and inspire them to contribute to this important and exciting field.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (25 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.45)
- Education (0.46)
- Information Technology (0.46)
- Health & Medicine (0.45)
- (2 more...)
Model Performance Prediction for Hyperparameter Optimization of Deep Learning Models Using High Performance Computing and Quantum Annealing
Amboage, Juan Pablo García, Wulff, Eric, Girone, Maria, Pena, Tomás F.
Hyperparameter Optimization (HPO) of Deep Learning-based models tends to be a compute resource intensive process as it usually requires to train the target model with many different hyperparameter configurations. We show that integrating model performance prediction with early stopping methods holds great potential to speed up the HPO process of deep learning models. Moreover, we propose a novel algorithm called Swift-Hyperband that can use either classical or quantum support vector regression for performance prediction and benefit from distributed High Performance Computing environments. This algorithm is tested not only for the Machine-Learned Particle Flow model used in High Energy Physics, but also for a wider range of target models from domains such as computer vision and natural language processing. Swift-Hyperband is shown to find comparable (or better) hyperparameters as well as using less computational resources in all test cases.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- Europe > Spain > Galicia > A Coruña Province > Santiago de Compostela (0.05)
- North America > Cuba (0.04)
A Surrogate-Assisted Extended Generative Adversarial Network for Parameter Optimization in Free-Form Metasurface Design
Dai, Manna, Jiang, Yang, Yang, Feng, Chattoraj, Joyjit, Xia, Yingzhi, Xu, Xinxing, Zhao, Weijiang, Dao, My Ha, Liu, Yong
Metasurfaces have widespread applications in fifth-generation (5G) microwave communication. Among the metasurface family, free-form metasurfaces excel in achieving intricate spectral responses compared to regular-shape counterparts. However, conventional numerical methods for free-form metasurfaces are time-consuming and demand specialized expertise. Alternatively, recent studies demonstrate that deep learning has great potential to accelerate and refine metasurface designs. Here, we present XGAN, an extended generative adversarial network (GAN) with a surrogate for high-quality free-form metasurface designs. The proposed surrogate provides a physical constraint to XGAN so that XGAN can accurately generate metasurfaces monolithically from input spectral responses. In comparative experiments involving 20000 free-form metasurface designs, XGAN achieves 0.9734 average accuracy and is 500 times faster than the conventional methodology. This method facilitates the metasurface library building for specific spectral responses and can be extended to various inverse design problems, including optical metamaterials, nanophotonic devices, and drug discovery.
- Asia > Singapore (0.06)
- Oceania > Australia (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- (4 more...)
Multi-GPU Approach for Training of Graph ML Models on large CFD Meshes
Strönisch, Sebastian, Sander, Maximilian, Knüpfer, Andreas, Meyer, Marcus
Mesh-based numerical solvers are an important part in many design tool chains. However, accurate simulations like computational fluid dynamics are time and resource consuming which is why surrogate models are employed to speed-up the solution process. Machine Learning based surrogate models on the other hand are fast in predicting approximate solutions but often lack accuracy. Thus, the development of the predictor in a predictor-corrector approach is the focus here, where the surrogate model predicts a flow field and the numerical solver corrects it. This paper scales a state-of-the-art surrogate model from the domain of graph-based machine learning to industry-relevant mesh sizes of a numerical flow simulation. The approach partitions and distributes the flow domain to multiple GPUs and provides halo exchange between these partitions during training. The utilized graph neural network operates directly on the numerical mesh and is able to preserve complex geometries as well as all other properties of the mesh. The proposed surrogate model is evaluated with an application on a three dimensional turbomachinery setup and compared to a traditionally trained distributed model. The results show that the traditional approach produces superior predictions and outperforms the proposed surrogate model. Possible explanations, improvements and future directions are outlined.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Germany > Saxony > Dresden (0.04)
Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads
Caspart, René, Ziegler, Sebastian, Weyrauch, Arvid, Obermaier, Holger, Raffeiner, Simon, Schuhmacher, Leon Pascal, Scholtyssek, Jan, Trofimova, Darya, Nolden, Marco, Reinartz, Ines, Isensee, Fabian, Götz, Markus, Debus, Charlotte
With the rise of artificial intelligence (AI) in recent years and the subsequent increase in complexity of the applied models, the growing demand in computational resources is starting to pose a significant challenge. The need for higher compute power is being met with increasingly more potent accelerator hardware as well as the use of large and powerful compute clusters. However, the gain in prediction accuracy from large models trained on distributed and accelerated systems ultimately comes at the price of a substantial increase in energy demand, and researchers have started questioning the environmental friendliness of such AI methods at scale. Consequently, awareness of energy efficiency plays an important role for AI model developers and hardware infrastructure operators likewise. The energy consumption of AI workloads depends both on the model implementation and the composition of the utilized hardware. Therefore, accurate measurements of the power draw of respective AI workflows on different types of compute nodes is key to algorithmic improvements and the design of future compute clusters and hardware. Towards this end, we present measurements of the energy consumption of two typical applications of deep learning models on different types of heterogeneous compute nodes. Our results indicate that 1. contrary to common approaches, deriving energy consumption directly from runtime is not accurate, but the consumption of the compute node needs to be considered regarding its composition; 2. neglecting accelerator hardware on mixed nodes results in overproportional inefficiency regarding energy consumption; 3. energy consumption of model training and inference should be considered separately - while training on GPUs outperforms all other node types regarding both runtime and energy consumption, inference on CPU nodes can be comparably efficient. One advantage of our approach is the fact that the information on energy consumption is available to all users of the supercomputer and not just those with administrator rights, enabling an easy transfer to other workloads alongside a raise in user-awareness of energy consumption.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Western Europe (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Health & Medicine > Therapeutic Area (0.70)
- Energy > Power Industry (0.68)
Meet Colossal-AI Team at SC22 and Other 3 Renowned International Conferences
Recently, Colossal-AI Team, which developed a unified deep learning system for the big model era, has been accepted and invited to deliver keynote speeches at a series of notable international conferences including SuperComputing 2022 (SC22), Open Data Science Conference (ODSC), World Artificial Intelligence Conference (WAIC), and AWS Summit. In the event, Colossal-AI Team is going to share many up-to-date and amazing things and technologies of High Performance Computing (HPC) and Artificial Intelligence (AI) that will change the world. Follow us and stay tuned! SC (formerly Supercomputing), the International Conference for High Performance Computing, Networking, Storage and Analysis, is the annual conference established in 1988 by the Association for Computing Machinery and the IEEE Computer Society. SC brings together the world's top research institutions and companies in the computer industry to share about the cutting-edge developments and innovations in HPC, networking, storage and analysis that will unlock new solutions and change our world.
- North America > United States > California (0.07)
- Asia > Singapore (0.06)
- Europe (0.05)
- Asia > China (0.05)