AITopics | Yang, Chih-Chieh

Collaborating Authors

Yang, Chih-Chieh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TP-Aware Dequantization

Hoque, Adnan, Srivatsa, Mudhakar, Yang, Chih-Chieh, Ganti, Raghu

arXiv.org Artificial IntelligenceJan-15-2024

Given the recent advancement of LLMs, deployment optimizations are becoming more crucial as the size of state-ofthe-art LLMs increase in scale. As these these models continue to grow, so does the need to optimize the increasingly parallel and increasingly distributed workload requirements of modern-day deep learning inference. Strategies like GPTQ [1] and Tensor Parallel (TP) [4] are hence essential in achieving high-throughput performance. Our method is motivated by several key properties of GPTQ, TP and General Matrix Multiplication (GEMM). We build on these existing methods and present a key innovation that helps maximize memory throughput and reduce latency. Our method shows up to a 1.81x speedup on Llama-70B and up to a 1.78x speedup on Granite-20B MLP layer problem sizes. We achieve this by reducing global communication and enforcing data locality.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.04925

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Accelerating Data Loading in Deep Neural Network Training

Yang, Chih-Chieh, Cong, Guojing

arXiv.org Machine LearningOct-2-2019

Data loading can dominate deep neural network training time on large-scale systems. We present a comprehensive study on accelerating data loading performance in large-scale distributed training. We first identify performance and scalability issues in current data loading implementations. We then propose optimizations that utilize CPU resources to the data loader design. We use an analytical model to characterize the impact of data loading on the overall training time and establish the performance trend as we scale up distributed training. Our model suggests that I/O rate limits the scalability of distributed training, which inspires us to design a locality-aware data loading method. By utilizing software caches, our method can drastically reduce the data loading communication volume in comparison with the original data loading implementation. Finally, we evaluate the proposed optimizations with various experiments. We achieved more than 30x speedup in data loading using 256 nodes with 1,024 learners.

deep learning, learner, neural network, (18 more...)

arXiv.org Machine Learning

1910.01196

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback