AITopics | Ganti, Raghu

Collaborating Authors

Ganti, Raghu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HadaCore: Tensor Core Accelerated Hadamard Transform Kernel

Agarwal, Krish, Astra, Rishi, Hoque, Adnan, Srivatsa, Mudhakar, Ganti, Raghu, Wright, Less, Chen, Sijia

arXiv.org Artificial IntelligenceDec-11-2024

We present HadaCore, a modified Fast Walsh-Hadamard Transform (FWHT) algorithm optimized for the Tensor Cores present in modern GPU hardware. HadaCore follows the recursive structure of the original FWHT algorithm, achieving the same asymptotic runtime complexity but leveraging a hardware-aware work decomposition that benefits from Tensor Core acceleration. This reduces bottlenecks from compute and data exchange. On Nvidia A100 and H100 GPUs, HadaCore achieves speedups of 1.1-1.4x and 1.0-1.3x, with a peak gain of 3.5x and 3.6x respectively, when compared to the existing state-of-the-art implementation of the original algorithm. We also show that when using FP16 or BF16, our implementation is numerically accurate, enabling comparable accuracy on MMLU benchmarks when used in an end-to-end Llama3 inference run with quantized (FP8) attention.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.08832

Country: North America > United States (0.29)

Genre: Research Report (0.51)

Industry: Information Technology (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Transforming the Hybrid Cloud for Emerging AI Workloads

Chen, Deming, Youssef, Alaa, Pendse, Ruchi, Schleife, André, Clark, Bryan K., Hamann, Hendrik, He, Jingrui, Laino, Teodoro, Varshney, Lav, Wang, Yuxiong, Sil, Avirup, Jabbarvand, Reyhaneh, Xu, Tianyin, Kindratenko, Volodymyr, Costa, Carlos, Adve, Sarita, Mendis, Charith, Zhang, Minjia, Núñez-Corrales, Santiago, Ganti, Raghu, Srivatsa, Mudhakar, Kim, Nam Sung, Torrellas, Josep, Huang, Jian, Seelam, Seetharami, Nahrstedt, Klara, Abdelzaher, Tarek, Eilam, Tamar, Zhao, Huimin, Manica, Matteo, Iyer, Ravishankar, Hirzel, Martin, Adve, Vikram, Marinov, Darko, Franke, Hubertus, Tong, Hanghang, Ainsworth, Elizabeth, Zhao, Han, Vasisht, Deepak, Do, Minh, Oliveira, Fabio, Pacifici, Giovanni, Puri, Ruchir, Nagpurkar, Priya

arXiv.org Artificial IntelligenceNov-20-2024

This white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co-design approaches, emphasizing usability, manageability, affordability, adaptability, efficiency, and scalability. By integrating cutting-edge technologies such as generative and agentic AI, cross-layer automation and optimization, unified control plane, and composable and adaptive system architecture, the proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. Incorporating quantum computing as it matures will enable quantum-accelerated simulations for materials science, climate modeling, and other high-impact domains. Collaborative efforts between academia and industry are central to this vision, driving advancements in foundation models for material design and climate solutions, scalable multimodal data processing, and enhanced physics-based AI emulators for applications like weather forecasting and carbon sequestration. Research priorities include advancing AI agentic systems, LLM as an Abstraction (LLMaaA), AI model optimization and unified abstractions across heterogeneous infrastructure, end-to-end edge-cloud transformation, efficient programming model, middleware and platform, secure infrastructure, application-adaptive cloud systems, and new quantum-classical collaborative workflows. These ideas and solutions encompass both theoretical and practical research questions, requiring coordinated input and support from the research community. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms, fostering breakthroughs in AI-driven applications and scientific discovery across academia, industry, and society.

data mining, large language model, machine learning, (24 more...)

arXiv.org Artificial Intelligence

2411.13239

Country:

Asia (0.67)
North America > United States > California > San Diego County (0.14)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview > Innovation (1.00)

Industry:

Information Technology > Services (1.00)
Energy > Oil & Gas > Upstream (0.65)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
(8 more...)

Add feedback

Accelerating Production LLMs with Combined Token/Embedding Speculators

Wertheimer, Davis, Rosenkranz, Joshua, Parnell, Thomas, Suneja, Sahil, Ranganathan, Pavithra, Ganti, Raghu, Srivatsa, Mudhakar

arXiv.org Artificial IntelligenceJun-6-2024

One approach to squaring this circle is speculative decoding, where a smaller draft model or speculator is trained This technical report describes the design and training to predict multiple tokens given a sequence of input. These of novel speculative decoding draft models, for accelerating speculative tokens are produced with low cost, and lower the inference speeds of large language models in a accuracy than the base LLM. However, we can leverage production environment. By conditioning draft predictions GPU parallelism during the LLM forward pass to evaluate on both context vectors and sampled tokens, we can train the output for each of these new tokens with minimal additional our speculators to efficiently predict high-quality n-grams, overhead. Then, by comparing the outputs to the which the base model then accepts or rejects. This allows us speculated inputs, we can accept all the predicted tokens to effectively predict multiple tokens per inference forward that match the output of the base model, while rejecting all pass, accelerating wall-clock inference speeds of highly optimized those that don't. In this way we can predict multiple tokens base model implementations by a factor of 2-3x. We per LLM forward pass at minimal extra cost. A deeper explore these initial results and describe next steps for further explanation of speculative decoding can be found in [3, 6].

large language model, natural language, speculator, (18 more...)

arXiv.org Artificial Intelligence

2404.19124

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SudokuSens: Enhancing Deep Learning Robustness for IoT Sensing Applications using a Generative Approach

Wang, Tianshi, Li, Jinyang, Wang, Ruijie, Kara, Denizhan, Liu, Shengzhong, Wertheimer, Davis, Viros-i-Martin, Antoni, Ganti, Raghu, Srivatsa, Mudhakar, Abdelzaher, Tarek

arXiv.org Artificial IntelligenceFeb-8-2024

This paper introduces SudokuSens, a generative framework for automated generation of training data in machine-learning-based Internet-of-Things (IoT) applications, such that the generated synthetic data mimic experimental configurations not encountered during actual sensor data collection. The framework improves the robustness of resulting deep learning models, and is intended for IoT applications where data collection is expensive. The work is motivated by the fact that IoT time-series data entangle the signatures of observed objects with the confounding intrinsic properties of the surrounding environment and the dynamic environmental disturbances experienced. To incorporate sufficient diversity into the IoT training data, one therefore needs to consider a combinatorial explosion of training cases that are multiplicative in the number of objects considered and the possible environmental conditions in which such objects may be encountered. Our framework substantially reduces these multiplicative training needs. To decouple object signatures from environmental conditions, we employ a Conditional Variational Autoencoder (CVAE) that allows us to reduce data collection needs from multiplicative to (nearly) linear, while synthetically generating (data for) the missing conditions. To obtain robustness with respect to dynamic disturbances, a session-aware temporal contrastive learning approach is taken. Integrating the aforementioned two approaches, SudokuSens significantly improves the robustness of deep learning for IoT applications. We explore the degree to which SudokuSens benefits downstream inference tasks in different data sets and discuss conditions under which the approach is particularly effective.

artificial intelligence, machine learning, sudokusen, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3625687.3625785

2402.02275

Country:

Asia > Middle East > Republic of Türkiye (0.16)
North America > United States > Illinois (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks > Manufacturer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TP-Aware Dequantization

Hoque, Adnan, Srivatsa, Mudhakar, Yang, Chih-Chieh, Ganti, Raghu

arXiv.org Artificial IntelligenceJan-15-2024

Given the recent advancement of LLMs, deployment optimizations are becoming more crucial as the size of state-ofthe-art LLMs increase in scale. As these these models continue to grow, so does the need to optimize the increasingly parallel and increasingly distributed workload requirements of modern-day deep learning inference. Strategies like GPTQ [1] and Tensor Parallel (TP) [4] are hence essential in achieving high-throughput performance. Our method is motivated by several key properties of GPTQ, TP and General Matrix Multiplication (GEMM). We build on these existing methods and present a key innovation that helps maximize memory throughput and reduce latency. Our method shows up to a 1.81x speedup on Llama-70B and up to a 1.78x speedup on Granite-20B MLP layer problem sizes. We achieve this by reducing global communication and enforcing data locality.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.04925

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition

Hoque, Adnan, Wright, Less, Yang, Jamie, Srivatsa, Mudhakar, Ganti, Raghu

arXiv.org Artificial IntelligenceJan-5-2024

We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposition. Our implementation shows improvement for the type of skinny matrix-matrix multiplications found in foundation model inference workloads. In particular, this paper surveys the type of matrix multiplication between a skinny activation matrix and a square weight matrix. Our results show an average of 65% speed improvement on A100, and an average of 124% speed improvement on H100 (with a peak of 295%) for a range of matrix dimensions including those found in a llama-style model, where m < n = k.

artificial intelligence, kernel, splitk vs data, (14 more...)

arXiv.org Artificial Intelligence

2402.00025

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

Foundation Models for Generalist Geospatial Artificial Intelligence

Jakubik, Johannes, Roy, Sujit, Phillips, C. E., Fraccaro, Paolo, Godwin, Denys, Zadrozny, Bianca, Szwarcman, Daniela, Gomes, Carlos, Nyirjesy, Gabby, Edwards, Blair, Kimura, Daiki, Simumba, Naomi, Chu, Linsong, Mukkavilli, S. Karthik, Lambhate, Devyani, Das, Kamal, Bangalore, Ranjini, Oliveira, Dario, Muszynski, Michal, Ankur, Kumar, Ramasubramanian, Muthukumaran, Gurung, Iksha, Khallaghi, Sam, Hanxi, null, Li, null, Cecil, Michael, Ahmadi, Maryam, Kordi, Fatemeh, Alemohammad, Hamed, Maskey, Manil, Ganti, Raghu, Weldemariam, Kommy, Ramachandran, Rahul

arXiv.org Artificial IntelligenceNov-8-2023

Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive geospatial data. We have utilized this framework to create Prithvi, a transformer-based geospatial foundational model pre-trained on more than 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. Our study demonstrates the efficacy of our framework in successfully fine-tuning Prithvi to a range of Earth observation tasks that have not been tackled by previous work on foundation models involving multi-temporal cloud gap imputation, flood mapping, wildfire scar segmentation, and multi-temporal crop segmentation. Our experiments show that the pre-trained model accelerates the fine-tuning process compared to leveraging randomly initialized weights. In addition, pre-trained Prithvi compares well against the state-of-the-art, e.g., outperforming a conditional GAN model in multi-temporal cloud imputation by up to 5pp (or 5.7%) in the structural similarity index. Finally, due to the limited availability of labeled data in the field of Earth observation, we gradually reduce the quantity of available labeled data for refining the model to evaluate data efficiency and demonstrate that data can be decreased significantly without affecting the model's accuracy. The pre-trained 100 million parameter model and corresponding fine-tuning workflows have been released publicly as open source contributions to the global Earth sciences community through Hugging Face.

foundation model, generalist geospatial artificial intelligence, machine learning

arXiv.org Artificial Intelligence

2310.1866

Genre: Research Report (0.40)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AI Foundation Models for Weather and Climate: Applications, Design, and Implementation

Mukkavilli, S. Karthik, Civitarese, Daniel Salles, Schmude, Johannes, Jakubik, Johannes, Jones, Anne, Nguyen, Nam, Phillips, Christopher, Roy, Sujit, Singh, Shraddha, Watson, Campbell, Ganti, Raghu, Hamann, Hendrik, Nair, Udaysankar, Ramachandran, Rahul, Weldemariam, Kommy

arXiv.org Artificial IntelligenceSep-19-2023

Machine learning and deep learning methods have been widely explored in understanding the chaotic behavior of the atmosphere and furthering weather forecasting. There has been increasing interest from technology companies, government institutions, and meteorological agencies in building digital twins of the Earth. Recent approaches using transformers, physics-informed machine learning, and graph neural networks have demonstrated state-of-the-art performance on relatively narrow spatiotemporal scales and specific tasks. With the recent success of generative artificial intelligence (AI) using pre-trained transformers for language modeling and vision with prompt engineering and fine-tuning, we are now moving towards generalizable AI. In particular, we are witnessing the rise of AI foundation models that can perform competitively on multiple domain-specific downstream tasks. Despite this progress, we are still in the nascent stages of a generalizable AI model for global Earth system models, regional climate models, and mesoscale weather models. Here, we review current state-of-the-art AI approaches, primarily from transformer and operator learning literature in the context of meteorology. We provide our perspective on criteria for success towards a family of foundation models for nowcasting and forecasting weather and climate predictions. We also discuss how such models can perform competitively on downstream tasks such as downscaling (super-resolution), identifying conditions conducive to the occurrence of wildfires, and predicting consequential meteorological phenomena across various spatiotemporal scales such as hurricanes and atmospheric rivers. In particular, we examine current AI methodologies and contend they have matured enough to design and implement a weather foundation model.

ai foundation model, artificial intelligence, machine learning, (5 more...)

arXiv.org Artificial Intelligence

2309.10808

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback