AITopics

2502.15015

Country:

Europe > Germany > Baden-Württemberg (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-6-2024

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

Dong, Juechu, Feng, Boyuan, Guessous, Driss, Liang, Yanbo, He, Horace

Over the past 7 years, attention has become one of the most important primitives in deep learning. The primary approach to optimize attention is FlashAttention, which fuses the operation together, drastically improving both the runtime and the memory consumption. However, the importance of FlashAttention combined with its monolithic nature poses a problem for researchers aiming to try new attention variants -- a "software lottery". This problem is exacerbated by the difficulty of writing efficient fused attention kernels, resisting traditional compiler-based approaches. We introduce FlexAttention, a novel compiler-driven programming model that allows implementing the majority of attention variants in a few lines of idiomatic PyTorch code. We demonstrate that many existing attention variants (e.g. Alibi, Document Masking, PagedAttention, etc.) can be implemented via FlexAttention, and that we achieve competitive performance compared to these handwritten kernels. Finally, we demonstrate how FlexAttention allows for easy composition of attention variants, solving the combinatorial explosion of attention variants.

artificial intelligence, machine learning, natural language, (17 more...)

2412.05496

Country:

North America > United States > Michigan (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceJun-26-2023

MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms

Wang, Yuke, Feng, Boyuan, Wang, Zheng, Geng, Tong, Barker, Kevin, Li, Ang, Ding, Yufei

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL, MGG-UVM, and ROC, respectively.

communication, data mining, machine learning, (21 more...)

2209.068

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Energy (0.88)
Information Technology > Services (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Data Science > Data Mining (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMay-31-2023

TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Wang, Yuke, Feng, Boyuan, Wang, Zheng, Huang, Guyue, Ding, Yufei

Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose TC-GNN, the first GNN acceleration framework based on GPU Tensor Core Units (TCUs). The core idea is to reconcile the "Sparse" GNN computation with the high-performance "Dense" TCUs. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of the sparse GNN workload. We implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We integrate TC-GNN with the PyTorch framework for high programmability. Rigorous experiments show an average of 1.70X speedup over the state-of-the-art DGL framework across various models and datasets.

artificial intelligence, data mining, machine learning, (19 more...)

2112.02052

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

arXiv.org Artificial IntelligenceNov-16-2021

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

Feng, Boyuan, Wang, Yuke, Geng, Tong, Li, Ang, Ding, Yufei

Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores. Specifically, APNN-TC first incorporates a novel emulation algorithm to support arbitrary short bit-width computation with int1 compute primitives and XOR/AND Boolean operations. Second, APNN-TC integrates arbitrary precision layer designs to efficiently map our emulation algorithm to Tensor Cores with novel batching strategies and specialized memory organization. Third, APNN-TC embodies a novel arbitrary precision NN design to minimize memory access across layers and further improve performance. Extensive evaluations show that APNN-TC can achieve significant speedup over CUTLASS kernels and various NN models, such as ResNet and VGG.

artificial intelligence, computation, machine learning, (17 more...)

2106.12169

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningSep-21-2020

Uncertainty-aware Attention Graph Neural Network for Defending Adversarial Attacks

Feng, Boyuan, Wang, Yuke, Wang, Zheng, Ding, Yufei

With the increasing popularity of graph-based learning, graph neural networks (GNNs) emerge as the essential tool for gaining insights from graphs. However, unlike the conventional CNNs that have been extensively explored and exhaustively tested, people are still worrying about the GNNs' robustness under the critical settings, such as financial services. The main reason is that existing GNNs usually serve as a black-box in predicting and do not provide the uncertainty on the predictions. On the other side, the recent advancement of Bayesian deep learning on CNNs has demonstrated its success of quantifying and explaining such uncertainties to fortify CNN models. Motivated by these observations, we propose UAG, the first systematic solution to defend adversarial attacks on GNNs through identifying and exploiting hierarchical uncertainties in GNNs. UAG develops a Bayesian Uncertainty Technique (BUT) to explicitly capture uncertainties in GNNs and further employs an Uncertainty-aware Attention Technique (UAT) to defend adversarial attacks on GNNs. Intensive experiments show that our proposed defense approach outperforms the state-of-the-art solutions by a significant margin.

deep learning, neural network, node, (19 more...)

2009.10235

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry:

Information Technology > Security & Privacy (0.94)
Government > Military (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceSep-21-2020

Scalable Adversarial Attack on Graph Neural Networks with Alternating Direction Method of Multipliers

Feng, Boyuan, Wang, Yuke, Li, Xu, Ding, Yufei

Graph neural networks (GNNs) have achieved high performance in analyzing graph-structured data and have been widely deployed in safety-critical areas, such as finance and autonomous driving. However, only a few works have explored GNNs' robustness to adversarial attacks, and their designs are usually limited by the scale of input datasets (i.e., focusing on small graphs with only thousands of nodes). In this work, we propose, SAG, the first scalable adversarial attack method with Alternating Direction Method of Multipliers (ADMM). We first decouple the large-scale graph into several smaller graph partitions and cast the original problem into several subproblems. Then, we propose to solve these subproblems using projected gradient descent on both the graph topology and the node features that lead to considerably lower memory consumption compared to the conventional attack methods. Rigorous experiments further demonstrate that SAG can significantly reduce the computation and memory overhead compared with the state-of-the-art approach, making SAG applicable towards graphs with large size of nodes and edges.

artificial intelligence, neural network, sag, (18 more...)

2009.10233

Country:

North America > United States > Louisiana (0.14)
North America > United States > California (0.14)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.92)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Machine LearningSep-16-2020

SGQuant: Squeezing the Last Bit on Graph Neural Networks with Specialized Quantization

Feng, Boyuan, Wang, Yuke, Li, Xu, Yang, Shu, Peng, Xueqiao, Ding, Yufei

With the increasing popularity of graph-based learning, Graph Neural Networks (GNNs) win lots of attention from the research and industry field because of their high accuracy. However, existing GNNs suffer from high memory footprints (e.g., node embedding features). This high memory footprint hurdles the potential applications towards memory-constrained devices, such as the widely-deployed IoT devices. To this end, we propose a specialized GNN quantization scheme, SGQuant, to systematically reduce the GNN memory consumption. Specifically, we first propose a GNN-tailored quantization algorithm design and a GNN quantization fine-tuning scheme to reduce memory consumption while maintaining accuracy. Then, we investigate the multi-granularity quantization strategy that operates at different levels (components, graph topology, and layers) of GNN computation. Moreover, we offer an automatic bit-selecting (ABS) to pinpoint the most appropriate quantization bits for the above multi-granularity quantizations. Intensive experiments show that SGQuant can effectively reduce the memory footprint from 4.25x to 31.9x compared with the original full-precision GNNs while limiting the accuracy drop to 0.4% on average.

deep learning, neural network, quantization, (19 more...)

2007.051

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningSep-28-2018

Domain-Adversarial Multi-Task Framework for Novel Therapeutic Property Prediction of Compounds

Xie, Lingwei, He, Song, Yang, Shu, Feng, Boyuan, Wan, Kun, Zhang, Zhongnan, Bo, Xiaochen, Ding, Yufei

With the rapid development of high-throughput technologies, parallel acquisition of large-scale drug-informatics data provides huge opportunities to improve pharmaceutical research and development. One significant application is the purpose prediction of small molecule compounds, aiming to specify therapeutic properties of extensive purpose-unknown compounds and to repurpose novel therapeutic properties of FDA-approved drugs. Such problem is very challenging since compound attributes contain heterogeneous data with various feature patterns such as drug fingerprint, drug physicochemical property, drug perturbation gene expression. Moreover, there is complex nonlinear dependency among heterogeneous data. In this paper, we propose a novel domain-adversarial multi-task framework for integrating shared knowledge from multiple domains. The framework utilizes the adversarial strategy to effectively learn target representations and models their nonlinear dependency. Experiments on two real-world datasets illustrate that the performance of our approach obtains an obvious improvement over competitive baselines. The novel therapeutic properties of purpose-unknown compounds we predicted are mostly reported or brought to the clinics. Furthermore, our framework can integrate various attributes beyond the three domains examined here and can be applied in the industry for screening the purpose of huge amounts of as yet unidentified compounds. Source codes of this paper are available on Github.

compound, deep learning, neural network, (21 more...)

1810.00867

Country: North America > United States > California > Santa Barbara County > Santa Barbara (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

arXiv.org Artificial IntelligenceSep-7-2018

SECS: Efficient Deep Stream Processing via Class Skew Dichotomy

Feng, Boyuan, Wan, Kun, Yang, Shu, Ding, Yufei

Despite that accelerating convolutional neural network (CNN) receives an increasing research focus, the save on resource consumption always comes with a decrease in accuracy. To both increase accuracy and decrease resource consumption, we explore an environment information, called class skew, which is easily available and exists widely in daily life. Since the class skew may switch as time goes, we bring up probability layer to utilize class skew without any overhead during the runtime. Further, we observe class skew dichotomy that some class skew may appear frequently in the future, called hot class skew, and others will never appear again or appear seldom, called cold class skew. Inspired by techniques from source code optimization, two modes, i.e., interpretation and compilation, are proposed. The interpretation mode pursues efficient adaption during runtime for cold class skew and the compilation mode aggressively optimize on hot ones for more efficient deployment in the future. Aggressive optimization is processed by class-specific pruning and provides extra benefit. Finally, we design a systematic framework, SECS, to dynamically detect class skew, processing interpretation and compilation, as well as select the most accurate architectures under the runtime resource budget. Extensive evaluations show that SECS can realize end-to-end classification speedups by a factor of 3x to 11x relative to state-of-the-art convolutional neural networks, at a higher accuracy.

class skew, deep learning, neural network, (18 more...)

1809.06691

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)