AITopics | Huang, Yuyang

Collaborating Authors

Huang, Yuyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Alchemist: Towards the Design of Efficient Online Continual Learning System

Huang, Yuyang, Liu, Yuhan, Gunawi, Haryadi S., Li, Beibin, Hwang, Changho

arXiv.org Artificial IntelligenceMar-14-2025

Continual learning has become a promising solution to refine large language models incrementally by leveraging user feedback. In particular, online continual learning - iteratively training the model with small batches of user feedback - has demonstrated notable performance improvements. However, the existing practice of separating training and serving processes forces the online trainer to recompute the intermediate results already done during serving. Such redundant computations can account for 30%-42% of total training time. In this paper, we propose Alchemist, to the best of our knowledge, the first online continual learning system that efficiently reuses serving activations to increase training throughput. Alchemist introduces two key techniques: (1) recording and storing activations and KV cache only during the prefill phase to minimize latency and memory overhead; and (2) smart activation offloading and hedging. Evaluations with inputs of varied token length sampled from ShareGPT dataset show that compared with a separate training cluster, Alchemist significantly increases training throughput by up to 1.72x, reduces up to 47% memory usage during training, and supports up to 2x more training tokens - all while maintaining negligible impact on serving latency.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.01066

Country: North America > United States (0.68)

Genre:

Research Report (1.00)
Instructional Material > Online (0.82)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Liu, Yuhan, Huang, Yuyang, Yao, Jiayi, Gu, Zhuohan, Du, Kuntai, Li, Hanchen, Cheng, Yihua, Jiang, Junchen, Lu, Shan, Musuvathi, Madan, Choukse, Esha

arXiv.org Artificial IntelligenceDec-19-2024

Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between fine-tuned LLMs derived from the same foundational model. DroidSpeak identifies critical layers in the KV cache and selectively recomputes them, enabling effective reuse of intermediate data while maintaining high accuracy. Our approach balances computational efficiency and task fidelity, significantly reducing inference latency and throughput bottlenecks. Experiments on diverse datasets and model pairs demonstrate that DroidSpeak achieves up to 3x higher throughputs and 2.6x faster prefill times with negligible accuracy loss compared to full recomputation.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.0282

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data/moment-driven approaches for fast predictive control of collective dynamics

Albi, Giacomo, Bicego, Sara, Herty, Michael, Huang, Yuyang, Kalise, Dante, Segala, Chiara

arXiv.org Artificial IntelligenceFeb-23-2024

Feedback control synthesis for large-scale particle systems is reviewed in the framework of model predictive control (MPC). The high-dimensional character of collective dynamics hampers the performance of traditional MPC algorithms based on fast online dynamic optimization at every time step. Two alternatives to MPC are proposed. First, the use of supervised learning techniques for the offline approximation of optimal feedback laws is discussed. Then, a procedure based on sequential linearization of the dynamics based on macroscopic quantities of the particle ensemble is reviewed. Both approaches circumvent the online solution of optimal control problems enabling fast, real-time, feedback synthesis for large-scale particle systems. Numerical experiments assess the performance of the proposed algorithms.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.15611

Country:

North America > United States (0.14)
Europe > United Kingdom (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Can Language Model Moderators Improve the Health of Online Discourse?

Cho, Hyundong, Liu, Shuai, Shi, Taiwei, Jain, Darpan, Rizk, Basem, Huang, Yuyang, Lu, Zixun, Wen, Nuan, Gratch, Jonathan, Ferrara, Emilio, May, Jonathan

arXiv.org Artificial IntelligenceNov-16-2023

Human moderation of online conversation is essential to maintaining civility and focus in a dialogue, but is challenging to scale and harmful to moderators. The inclusion of sophisticated natural language generation modules as a force multiplier aid moderators is a tantalizing prospect, but adequate evaluation approaches have so far been elusive. In this paper, we establish a systematic definition of conversational moderation effectiveness through a multidisciplinary lens that incorporates insights from social science. We then propose a comprehensive evaluation framework that uses this definition to asses models' moderation capabilities independently of human intervention. With our framework, we conduct the first known study Figure 1: While banning users or deleting their comments of conversational dialogue models as moderators, may push them towards echo chambers (left), conversational finding that appropriately prompted models moderation can guide users towards more can provide specific and fair feedback on constructive behavior (right). Recent developments in toxic behavior but struggle to influence users to conversational AI present an opportunity to perform this increase their levels of respect and cooperation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.10781

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry:

Law (0.68)
Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback

CacheGen: Fast Context Loading for Language Model Applications

Liu, Yuhan, Li, Hanchen, Du, Kuntai, Yao, Jiayi, Cheng, Yihua, Huang, Yuyang, Lu, Shan, Maire, Michael, Hoffmann, Henry, Holtzman, Ari, Ananthanarayanan, Ganesh, Jiang, Junchen

arXiv.org Artificial IntelligenceOct-11-2023

As large language models (LLMs) take on more complex tasks, their inputs incorporate longer contexts to respond to questions that require domain knowledge or user-specific conversational histories. Yet, using long contexts poses a challenge for responsive LLM systems, as nothing can be generated until all the contexts are fetched to and processed by the LLM. Existing systems optimize only the computation delay in context processing (e.g., by caching intermediate key-value features of the text context) but often cause longer network delays in context fetching (e.g., key-value features consume orders of magnitude larger bandwidth than the text context). This paper presents CacheGen to minimize the delays in fetching and processing contexts for LLMs. CacheGen reduces the bandwidth needed for transmitting long contexts' key-value (KV) features through a novel encoder that compresses KV features into more compact bitstream representations. The encoder combines adaptive quantization with a tailored arithmetic coder, taking advantage of the KV features' distributional properties, such as locality across tokens. Furthermore, CacheGen minimizes the total delay in fetching and processing a context by using a controller that determines when to load the context as compressed KV features or raw text and picks the appropriate compression level if loaded as KV features. We test CacheGen on three models of various sizes and three datasets of different context lengths. Compared to recent methods that handle long contexts, CacheGen reduces bandwidth usage by 3.7-4.3x and the total delay in fetching and processing contexts by 2.7-3x while maintaining similar LLM performance on various tasks as loading the text contexts.

language model application, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2310.0724

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

Du, Kuntai, Liu, Yuhan, Hao, Yitian, Zhang, Qizheng, Wang, Haodong, Huang, Yuyang, Ananthanarayanan, Ganesh, Jiang, Junchen

arXiv.org Artificial IntelligenceOct-3-2023

Deep learning inference on streaming media data, such as object detection in video or LiDAR feeds and text extraction from audio waves, is now ubiquitous. To achieve high inference accuracy, these applications typically require significant network bandwidth to gather high-fidelity data and extensive GPU resources to run deep neural networks (DNNs). While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs. This paper presents OneAdapt, which meets these requirements by leveraging a gradient-ascent strategy to adapt configuration knobs. The key idea is to embrace DNNs' differentiability to quickly estimate the accuracy's gradient to each configuration knob, called AccGrad. Specifically, OneAdapt estimates AccGrad by multiplying two gradients: InputGrad (i.e. how each configuration knob affects the input to the DNN) and DNNGrad (i.e. how the DNN input affects the DNN inference output). We evaluate OneAdapt across five types of configurations, four analytic tasks, and five types of input data. Compared to state-of-the-art adaptation schemes, OneAdapt cuts bandwidth usage and GPU usage by 15-59% while maintaining comparable accuracy or improves accuracy by 1-5% while using equal or fewer resources.

artificial intelligence, deep learning application, machine learning, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3620678.3624653

2310.02422

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback