AITopics | Kao, Sheng-Chun

Collaborating Authors

Kao, Sheng-Chun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NonGEMM Bench: Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads

Karami, Rachid, Kota, Hemanth, Kao, Sheng-Chun, Kwon, Hyoukjun

arXiv.org Artificial IntelligenceApr-24-2024

Machine Learning (ML) operators are the building blocks to design ML models with various target applications. GEneral Matrix Multiplication (GEMM) operators are the backbone of ML models. They are notorious for being computationally expensive requiring billions of multiply-and-accumulate. Therefore, significant effort has been put to study and optimize the GEMM operators in order to speed up the execution of ML models. GPUs and accelerators are widely deployed to accelerate ML workloads by optimizing the execution of GEMM operators. Nonetheless, the performance of NonGEMM operators have not been studied as thoroughly as GEMMs. Therefore, this paper describes \bench, a benchmark to study NonGEMM operators. We first construct \bench using popular ML workloads from different domains, then perform case studies on various grade GPU platforms to analyze the behavior of NonGEMM operators in GPU accelerated systems. Finally, we present some key takeaways to bridge the gap between GEMM and NonGEMM operators and to offer the community with potential new optimization directions.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.11788

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)

Add feedback

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

Bambhaniya, Abhimanyu Rajeshkumar, Yazdanbakhsh, Amir, Subramanian, Suvinay, Kao, Sheng-Chun, Agrawal, Shivani, Evci, Utku, Krishna, Tushar

arXiv.org Artificial IntelligenceFeb-7-2024

N:M Structured sparsity has garnered significant interest as a result of relatively modest overhead and improved efficiency. Additionally, this form of sparsity holds considerable appeal for reducing the memory footprint owing to their modest representation overhead. There have been efforts to develop training recipes for N:M structured sparsity, they primarily focus on low-sparsity regions ($\sim$50\%). Nonetheless, performance of models trained using these approaches tends to decline when confronted with high-sparsity regions ($>$80\%). In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions. We demonstrate that the significant factor contributing to this disparity is the presence of elevated levels of induced noise in the gradient magnitudes. To mitigate this undesirable effect, we employ decay mechanisms to progressively restrict the flow of gradients towards pruned elements. Our approach improves the model quality by up to 2$\%$ and 5$\%$ in vision and language models at high sparsity regime, respectively. We also evaluate the trade-off between model accuracy and training compute cost in terms of FLOPs. At iso-training FLOPs, our method yields better performance compared to conventional sparse training recipes, exhibiting an accuracy improvement of up to 2$\%$. The source code is available at https://github.com/abhibambhaniya/progressive_gradient_flow_nm_sparsity.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.04744

Genre: Research Report (1.00)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

JaxPruner: A concise library for sparsity research

Lee, Joo Hyung, Park, Wonpyo, Mitchell, Nicole, Pilault, Jonathan, Obando-Ceron, Johan, Kim, Han-Byul, Lee, Namhoon, Frantar, Elias, Long, Yun, Yazdanbakhsh, Amir, Agrawal, Shivani, Subramanian, Suvinay, Wang, Xin, Kao, Sheng-Chun, Zhang, Xingyao, Gale, Trevor, Bik, Aart, Han, Woohyun, Ferev, Milen, Han, Zhonglin, Kim, Hong-Seok, Dauphin, Yann, Dziugaite, Gintare Karolina, Castro, Pablo Samuel, Evci, Utku

arXiv.org Artificial IntelligenceDec-18-2023

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.14082

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Demystifying Map Space Exploration for NPUs

Kao, Sheng-Chun, Parashar, Angshuman, Tsai, Po-An, Krishna, Tushar

arXiv.org Artificial IntelligenceOct-7-2022

Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator. It is known to be extremely computationally expensive, and there has been active research looking at both heuristics and learning-based methods to make the problem computationally tractable. However, while there are dozens of mappers out there (all empirically claiming to find better mappings than others), the research community lacks systematic insights on how different search techniques navigate the map-space and how different mapping axes contribute to the accelerator's performance and efficiency. Such insights are crucial to developing mapping frameworks for emerging DNNs that are increasingly irregular (due to neural architecture search) and sparse, making the corresponding map spaces much more complex. In this work, rather than proposing yet another mapper, we do a first-of-its-kind apples-to-apples comparison of search techniques leveraged by different mappers. Next, we extract the learnings from our study and propose two new techniques that can augment existing mappers -- warm-start and sparsity-aware -- that demonstrate speedups, scalability, and robustness across diverse DNN models.

evolutionary algorithm, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.03731

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators

Kao, Sheng-Chun, Huang, Xiaoyu, Krishna, Tushar

arXiv.org Artificial IntelligenceJan-26-2022

Dataflow/mapping decides the compute and energy efficiency of DNN accelerators. Many mappers have been proposed to tackle the intra-layer map-space. However, mappers for inter-layer map-space (aka layer-fusion map-space), have been rarely discussed. In this work, we propose a mapper, DNNFuser, specifically focusing on this layer-fusion map-space. While existing SOTA DNN mapping explorations rely on search-based mappers, this is the first work, to the best of our knowledge, to propose a one-shot inference-based mapper. We leverage a famous language model GPT as our DNN architecture to learn layer-fusion optimization as a sequence modeling problem. Further, the trained DNNFuser can generalize its knowledge and infer new solutions for unseen conditions. Within one inference pass, DNNFuser can infer solutions with compatible performance to the ones found by a highly optimized search-based mapper while being 66x-127x faster.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2201.11218

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
(2 more...)

Add feedback

DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators

Kao, Sheng-Chun, Pellauer, Michael, Parashar, Angshuman, Krishna, Tushar

arXiv.org Artificial IntelligenceJan-26-2022

The design of DNN accelerators includes two key parts: HW resource configuration and mapping strategy. Intensive research has been conducted to optimize each of them independently. Unfortunately, optimizing for both together is extremely challenging due to the extremely large cross-coupled search space. To address this, in this paper, we propose a HW-Mapping co-optimization framework, an efficient encoding of the immense design space constructed by HW and Mapping, and a domain-aware genetic algorithm, named DiGamma, with specialized operators for improving search efficiency. We evaluate DiGamma with seven popular DNNs models with different properties. Our evaluations show DiGamma can achieve (geomean) 3.0x and 10.0x speedup, comparing to the best-performing baseline optimization algorithms, in edge and cloud settings.

artificial intelligence, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2201.1122

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling

Kao, Sheng-Chun, Krishna, Tushar

arXiv.org Artificial IntelligenceApr-30-2021

As Deep Learning continues to drive a variety of applications in datacenters and HPC, there is a growing trend towards building large accelerators with several sub-accelerator cores/chiplets. This work looks at the problem of supporting multi-tenancy on such accelerators. In particular, we focus on the problem of mapping layers from several DNNs simultaneously on an accelerator. Given the extremely large search space, we formulate the search as an optimization problem and develop a specialized genetic algorithm called G# withcustom operators to enable structured sample-efficient exploration. We quantitatively compare G# with several common heuristics, state-of-the-art optimization methods, and reinforcement learning methods across different accelerator set-tings (large/small accelerators) and different sub-accelerator configurations (homogeneous/heterogeneous), and observeG# can consistently find better solutions. Further, to enable real-time scheduling, we also demonstrate a method to generalize the learnt schedules and transfer them to the next batch of jobs, reducing schedule compute time to near zero.

accelerator, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

2104.13997

Country:

Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report (0.63)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Generative Design of Hardware-aware DNNs

Kao, Sheng-Chun, Ramamurthy, Arun, Krishna, Tushar

arXiv.org Machine LearningJul-12-2020

To efficiently run DNNs on the edge/cloud, many new DNN inference accelerators are being designed and deployed frequently. To enhance the resource efficiency of DNNs, model quantization is a widely-used approach. However, different accelerator/HW has different resources leading to the need for specialized quantization strategy of each HW. Moreover, using the same quantization for every layer may be sub-optimal, increasing the designspace of possible quantization choices. This makes manual-tuning infeasible. Recent work in automatically determining quantization for each layer is driven by optimization methods such as reinforcement learning. However, these approaches need re-training the RL for every new HW platform. We propose a new way for autonomous quantization and HW-aware tuning. We propose a generative model, AQGAN, which takes a target accuracy as the condition and generates a suite of quantization configurations. With the conditional generative model, the user can autonomously generate different configurations with different targets in inference time. Moreover, we propose a simplified HW-tuning flow, which uses the generative model to generate proposals and execute simple selection based on the HW resource budget, whose process is fast and interactive. We evaluate our model on five of the widely-used efficient models on the ImageNet dataset. We compare with existing uniform quantization and state-of-the-art autonomous quantization methods. Our generative model shows competitive achieved accuracy, however, with around two degrees less search cost for each design point. Our generative model shows the generated quantization configuration can lead to less than 3.5% error across all experiments.

accuracy, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2006.03968

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Conditional Neural Architecture Search

Kao, Sheng-Chun, Ramamurthy, Arun, Williams, Reed, Krishna, Tushar

arXiv.org Machine LearningJun-6-2020

Designing resource-efficient Deep Neural Networks (DNNs) is critical to deploy deep learning solutions over edge platforms due to diverse performance, power, and memory budgets. Unfortunately, it is often the case a well-trained ML model does not fit to the constraint of deploying edge platforms, causing a long iteration of model reduction and retraining process. Moreover, a ML model optimized for platform-A often may not be suitable when we deploy it on another platform-B, causing another iteration of model retraining. We propose a conditional neural architecture search method using GAN, which produces feasible ML models for different platforms. We present a new workflow to generate constraint-optimized DNN models. This is the first work of bringing in condition and adversarial technique into Neural Architecture Search domain. We verify the method with regression problems and classification on CIFAR-10. The proposed workflow can successfully generate resource-optimized MLP or CNN-based networks.

constraint, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

2006.03969

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Education (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforcement Learning based Interconnection Routing for Adaptive Traffic Optimization

Kao, Sheng-Chun, Yang, Chao-Han Huck, Chen, Pin-Yu, Ma, Xiaoli, Krishna, Tushar

arXiv.org Artificial IntelligenceAug-13-2019

Applying Machine Learning (ML) techniques to design and optimize computer architectures is a promising research direction. Optimizing the runtime performance of a Network-on-Chip (NoC) necessitates a continuous learning framework. In this work, we demonstrate the promise of applying reinforcement learning (RL) to optimize NoC runtime performance. We present three RL-based methods for learning optimal routing algorithms. The experimental results show the algorithms can successfully learn a near-optimal solution across different environment states. Reproducible Code: github.com/huckiyang/interconnect-routing-gym

artificial intelligence, reinforcement learning, télécommunications, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3313231.335236

1908.04484

Genre: Research Report > New Finding (0.35)

Industry: Telecommunications (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback