AITopics | Shivdikar, Kaustubh

Collaborating Authors

Shivdikar, Kaustubh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

Shivdikar, Kaustubh, Agostini, Nicolas Bohm, Jayaweera, Malith, Jonatan, Gilbert, Abellan, Jose L., Joshi, Ajay, Kim, John, Kaeli, David

arXiv.org Artificial IntelligenceApr-26-2024

Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing. To tackle these challenges, we introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm. NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication. This separation allows for independent exploitation of their unique data dependencies, facilitating efficient resource allocation. We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations. Furthermore, the compute resource load balancing is achieved through a dynamic reseeding hash-based mapping, ensuring uniform utilization of computing resources agnostic of sparsity patterns. Finally, we present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis. Overall, NeuraChip presents a significant improvement, yielding an average speedup of 22.1x over Intel's MKL, 17.1x over NVIDIA's cuSPARSE, 16.7x over AMD's hipSPARSE, and 1.5x over prior state-of-the-art SpGEMM accelerator and 1.3x over GNN accelerator. The source code for our open-sourced simulator and performance visualizer is publicly accessible on GitHub https://neurachip.us

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2404.1551

Country: North America > United States (0.46)

Genre: Research Report (0.81)

Industry:

Information Technology > Services (0.34)
Energy > Power Industry (0.34)

Technology:

Information Technology > Software (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Architecture (1.00)
(2 more...)

Add feedback

MaxK-GNN: Towards Theoretical Speed Limits for Accelerating Graph Neural Networks Training

Peng, Hongwu, Xie, Xi, Shivdikar, Kaustubh, Hasan, MD Amit, Zhao, Jiahui, Huang, Shaoyi, Khan, Omer, Kaeli, David, Ding, Caiwen

arXiv.org Artificial IntelligenceDec-18-2023

In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges but memory traffic is still significant. We argue that drastic performance improvements can only be achieved by the vertical optimization of algorithm and system innovations, rather than treating the speedup optimization as an "after-thought" (i.e., (i) given a GNN algorithm, designing an accelerator, or (ii) given hardware, mainly optimizing the GNN algorithm). In this paper, we present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation. (i) We introduce the MaxK nonlinearity and provide a theoretical analysis of MaxK nonlinearity as a universal approximator, and present the Compressed Balanced Sparse Row (CBSR) format, designed to store the data and index of the feature matrix after nonlinearity; (ii) We design a coalescing enhanced forward computation with row-wise product-based SpGEMM Kernel using CBSR for input feature matrix fetching and strategic placement of a sparse output accumulation buffer in shared memory; (iii) We develop an optimized backward computation with outer product-based and SSpMM Kernel. We conduct extensive evaluations of MaxK-GNN and report the end-to-end system run-time. Experiments show that MaxK-GNN system could approach the theoretical speedup limit according to Amdahl's law. We achieve comparable accuracy to SOTA GNNs, but at a significantly increased speed: 3.22/4.24 times speedup (vs. theoretical limits, 5.52/7.27 times) on Reddit compared to DGL and GNNAdvisor implementations.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2312.08656

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Enabling Accelerators for Graph Computing

Shivdikar, Kaustubh

arXiv.org Artificial IntelligenceDec-16-2023

The advent of Graph Neural Networks (GNNs) has revolutionized the field of machine learning, offering a novel paradigm for learning on graph-structured data. Unlike traditional neural networks, GNNs are capable of capturing complex relationships and dependencies inherent in graph data, making them particularly suited for a wide range of applications including social network analysis, molecular chemistry, and network security. The impact of GNNs in these domains is profound, enabling more accurate models and predictions, and thereby contributing significantly to advancements in these fields. GNNs, with their unique structure and operation, present new computational challenges compared to conventional neural networks. This requires comprehensive benchmarking and a thorough characterization of GNNs to obtain insight into their computational requirements and to identify potential performance bottlenecks. In this thesis, we aim to develop a better understanding of how GNNs interact with the underlying hardware and will leverage this knowledge as we design specialized accelerators and develop new optimizations, leading to more efficient and faster GNN computations. Synthesizing these insights and optimizations, we design a state-of-the-art hardware accelerator capable of efficiently handling various GNN workloads. Our accelerator architecture is built on our characterization of GNN computational demands, providing clear motivation for our approach. Furthermore, we extend our exploration to emerging GNN workloads in the domain of graph neural networks. This exploration into novel models underlines our comprehensive approach, as we strive to enable accelerators that are not just performant, but also versatile, able to adapt to the evolving landscape of graph computing.

artificial intelligence, machine learning, social media, (20 more...)

arXiv.org Artificial Intelligence

2312.10561

Country:

Asia (1.00)
North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.47)

Industry:

Information Technology (1.00)
Transportation > Infrastructure & Services (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.92)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback