AITopics | Wang, Yitu

Collaborating Authors

Wang, Yitu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

Guo, Cong, Cheng, Feng, Du, Zhixu, Kiessling, James, Ku, Jonathan, Li, Shiyu, Li, Ziru, Ma, Mingyuan, Molom-Ochir, Tergel, Morris, Benjamin, Shan, Haoxuan, Sun, Jingwei, Wang, Yitu, Wei, Chiyue, Wu, Xueying, Wu, Yuhao, Yang, Hao Frank, Zhang, Jingyang, Zhang, Junyao, Zheng, Qilin, Zhou, Guanglei, Hai, null, Li, null, Chen, Yiran

arXiv.org Artificial IntelligenceOct-8-2024

The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence, demonstrating remarkable capabilities in natural language processing and moving towards multi-modal functionality. These models are increasingly integrated into diverse applications, impacting both research and industry. However, their development and deployment present substantial challenges, including the need for extensive computational resources, high energy consumption, and complex software optimizations. Unlike traditional deep learning systems, LLMs require unique optimization strategies for training and inference, focusing on system-level efficiency. This paper surveys hardware and software co-design approaches specifically tailored to address the unique characteristics and constraints of large language models. This survey analyzes the challenges and impacts of LLMs on hardware and algorithm research, exploring algorithm optimization, hardware design, and system-level innovations. It aims to provide a comprehensive understanding of the trade-offs and considerations in LLM-centric computing systems, guiding future advancements in AI. Finally, we summarize the existing efforts in this space and outline future directions toward realizing production-grade co-design methodologies for the next generation of large language models and AI systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.07265

Country:

Asia > China (0.68)
North America > United States > California (0.45)
North America > United States > North Carolina (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Information Technology (0.93)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment

Zhang, Jingyang, Yang, Huanrui, Chen, Fan, Wang, Yitu, Li, Hai

arXiv.org Machine LearningSep-18-2019

Emerging resistive random-access memory (ReRAM) has recently been intensively investigated to accelerate the processing of deep neural networks (DNNs). Due to the in-situ computation capability, analog ReRAM crossbars yield significant throughput improvement and energy reduction compared to traditional digital methods. However, the power hungry analog-to-digital converters (ADCs) prevent the practical deployment of ReRAM-based DNN accelerators on end devices with limited chip area and power budget. We observe that due to the limited bit-density of ReRAM cells, DNN weights are bit sliced and correspondingly stored on multiple ReRAM bitlines. The accumulated current on bitlines resulted by weights directly dictates the overhead of ADCs. As such, bitwise weight sparsity rather than the sparsity of the full weight, is desirable for efficient ReRAM deployment. In this work, we propose bit-slice L1, the first algorithm to induce bit-slice sparsity during the training of dynamic fixed-point DNNs. Experiment results show that our approach achieves 2x sparsity improvement compared to previous algorithms. The resulting sparsity allows the ADC resolution to be reduced to 1-bit of the most significant bit-slice and down to 3-bit for the others bits, which significantly speeds up processing and reduces power and area overhead.

deep learning, neural network, sparsity, (19 more...)

arXiv.org Machine Learning

1909.08496

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback