AITopics | Liu, Liyuan

Collaborating Authors

Liu, Liyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Vector-ICL: In-context Learning with Continuous Vector Representations

Zhuang, Yufan, Singh, Chandan, Liu, Liyuan, Shang, Jingbo, Gao, Jianfeng

arXiv.org Artificial IntelligenceOct-7-2024

Large language models (LLMs) have shown remarkable in-context learning (ICL) capabilities on textual data. We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders. By aligning input data with an LLM's embedding space through lightweight projectors, we observe that LLMs can effectively process and learn from these projected vectors, which we term Vector-ICL. In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL, while task-specific finetuning further enhances performance. In our experiments across various tasks and modalities, including text reconstruction, numerical function regression, text classification, summarization, molecule captioning, time-series classification, graph classification, and fMRI decoding, Vector-ICL often surpasses both few-shot ICL and domain-specific model or tuning. We further conduct analyses and case studies, indicating the potential of LLMs to process vector representations beyond traditional token-based paradigms.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.05629

Country:

Europe (0.67)
North America > United States > Pennsylvania (0.14)
North America > United States > Oregon (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy

Zhang, Rongzhi, Wang, Kuang, Liu, Liyuan, Wang, Shuohang, Cheng, Hao, Zhang, Chao, Shen, Yelong

arXiv.org Artificial IntelligenceOct-3-2024

The Key-Value (KV) cache is a crucial component in serving transformer-based autoregressive large language models (LLMs), enabling faster inference by storing previously computed KV vectors. However, its memory consumption scales linearly with sequence length and batch size, posing a significant bottleneck in LLM deployment. Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages, which requires extensive parameter tuning thus unsuitable for pre-trained LLMs; (2) KV cache compression at test time, primarily through token eviction policies, which often overlook inter-layer dependencies and can be task-specific. This paper introduces an orthogonal approach to KV cache compression. We propose a low-rank approximation of KV weight matrices, allowing for plug-in integration with existing transformer-based LLMs without model retraining. To effectively compress KV cache at the weight level, we adjust for layerwise sensitivity and introduce a progressive compression strategy, which is supported by our theoretical analysis on how compression errors accumulate in deep networks. Our method is designed to function without model tuning in upcycling stages or task-specific profiling in test stages. Extensive experiments with LLaMA models ranging from 8B to 70B parameters across various tasks show that our approach significantly reduces the GPU memory footprint while maintaining performance.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.03111

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Toward Student-Oriented Teacher Network Training For Knowledge Distillation

Dong, Chengyu, Liu, Liyuan, Shang, Jingbo

arXiv.org Artificial IntelligenceMay-9-2024

How to conduct teacher training for knowledge distillation is still an open problem. It has been widely observed that a best-performing teacher does not necessarily yield the best-performing student, suggesting a fundamental discrepancy between the current teacher training practice and the ideal teacher training strategy. To fill this gap, we explore the feasibility of training a teacher that is oriented toward student performance with empirical risk minimization (ERM). Our analyses are inspired by the recent findings that the effectiveness of knowledge distillation hinges on the teacher's capability to approximate the true label distribution of training inputs. We theoretically establish that the ERM minimizer can approximate the true label distribution of training data as long as the feature extractor of the learner network is Lipschitz continuous and is robust to feature transformations. In light of our theory, we propose a teacher training method SoTeacher which incorporates Lipschitz regularization and consistency regularization into ERM. Experiments on benchmark datasets using various knowledge distillation algorithms and teacher-student pairs confirm that SoTeacher can improve student accuracy consistently.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.06661

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Education > Teacher Education (0.87)
Education > Assessment & Standards > Student Performance (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning a Decision Tree Algorithm with Transformers

Zhuang, Yufan, Liu, Liyuan, Singh, Chandan, Shang, Jingbo, Gao, Jianfeng

arXiv.org Artificial IntelligenceFeb-6-2024

Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To address this, we introduce MetaTree, which trains a transformer-based model on filtered outputs from classical algorithms to produce strong decision trees for classification. Specifically, we fit both greedy decision trees and optimized decision trees on a large number of datasets. We then train MetaTree to produce the trees that achieve strong generalization performance. This training enables MetaTree to not only emulate these algorithms, but also to intelligently adapt its strategy according to the context, thereby achieving superior generalization performance.

artificial intelligence, decision tree learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.03774

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Ge, Suyu, Zhang, Yunan, Liu, Liyuan, Zhang, Minjia, Han, Jiawei, Gao, Jianfeng

arXiv.org Artificial IntelligenceJan-29-2024

In this study, we introduce adaptive KV cache compression, a plug-and-play method that reduces the memory footprint of generative inference for Large Language Models (LLMs). Different from the conventional KV cache that retains key and value vectors for all context tokens, we conduct targeted profiling to discern the intrinsic structure of attention modules. Based on the recognized structure, we then construct the KV cache in an adaptive manner: evicting long-range contexts on attention heads emphasizing local contexts, discarding non-special tokens on attention heads centered on special tokens, and only employing the standard KV cache for attention heads that broadly attend to all tokens. Moreover, with the lightweight attention profiling used to guide the construction of the adaptive KV cache, FastGen can be deployed without resource-intensive fine-tuning or re-training. In our experiments across various asks, FastGen demonstrates substantial reduction on GPU memory consumption with negligible generation quality loss. We will release our code and the compatible CUDA kernel for reproducibility.

kv cache, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.01801

Country:

North America > United States > Illinois (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Zhang, Qingru, Singh, Chandan, Liu, Liyuan, Liu, Xiaodong, Yu, Bin, Gao, Jianfeng, Zhao, Tuo

arXiv.org Artificial IntelligenceNov-3-2023

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting with large language models (LLMs), we have a similar need - steering the model to pay closer attention to user-specified information, e.g., an instruction. Existing methods, however, are constrained to process plain text and do not support such a mechanism. This motivates us to introduce PASTA - Post-hoc Attention STeering Approach, a method that allows LLMs to read text with user-specified emphasis marks. To this end, PASTA identifies a small subset of attention heads and applies precise attention reweighting on them, directing the model attention to user-specified parts. Like prompting, PASTA is applied at inference time and does not require changing any model parameters. Experiments demonstrate that PASTA can substantially enhance an LLM's ability to follow user instructions or integrate new knowledge from user inputs, leading to a significant performance improvement on a variety of tasks, e.g., an average accuracy improvement of 22% for LLAMA-7B. Our code is publicly available at https://github.com/QingruZhang/PASTA .

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.02262

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Leisure & Entertainment (0.46)
Media > Photography (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bridging Discrete and Backpropagation: Straight-Through and Beyond

Liu, Liyuan, Dong, Chengyu, Liu, Xiaodong, Yu, Bin, Gao, Jianfeng

arXiv.org Artificial IntelligenceOct-16-2023

Backpropagation, the cornerstone of deep learning, is limited to computing gradients for continuous variables. This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. First, we examine the widely used Straight-Through (ST) heuristic and demonstrate that it works as a first-order approximation of the gradient. Guided by our findings, we propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs. ReinMax does not require Hessian or other second-order derivatives, thus having negligible computation overheads. Extensive experimental results on various tasks demonstrate the superiority of ReinMax over the state of the art.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.08612

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.61)

Add feedback

Fast-ELECTRA for Efficient Pre-training

Dong, Chengyu, Liu, Liyuan, Cheng, Hao, Shang, Jingbo, Gao, Jianfeng, Liu, Xiaodong

arXiv.org Artificial IntelligenceOct-11-2023

ELECTRA pre-trains language models by detecting tokens in a sequence that have been replaced by an auxiliary model. Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model. Notably, this model, which is jointly trained with the main model, only serves to assist the training of the main model and is discarded post-training. This results in a substantial amount of training cost being expended in vain. To mitigate this issue, we propose Fast-ELECTRA, which leverages an existing language model as the auxiliary model. To construct a learning curriculum for the main model, we smooth its output distribution via temperature scaling following a descending schedule. Our approach rivals the performance of state-of-the-art ELECTRA-style pre-training methods, while significantly eliminating the computation and memory cost brought by the joint training of the auxiliary model. Our method also reduces the sensitivity to hyper-parameters and enhances the pre-training stability.

artificial intelligence, efficient pre-training, fast-electra

arXiv.org Artificial Intelligence

2310.07347

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Sparse Backpropagation for MoE Training

Liu, Liyuan, Gao, Jianfeng, Chen, Weizhu

arXiv.org Artificial IntelligenceOct-1-2023

One defining characteristic of Mixture-of-Expert (MoE) models is their capacity for conducting sparse computation via expert routing, leading to remarkable scalability. However, backpropagation, the cornerstone of deep learning, requires dense computation, thereby posting challenges in MoE gradient computations. Here, we introduce SparseMixer, a scalable gradient estimator that bridges the gap between backpropagation and sparse expert routing. Unlike typical MoE training which strategically neglects certain gradient terms for the sake of sparse computation and scalability, SparseMixer provides scalable gradient approximations for these terms, enabling reliable gradient estimation in MoE training. Grounded in a numerical ODE framework, SparseMixer harnesses the mid-point method, a second-order ODE solver, to deliver precise gradient approximations with negligible computational overhead. Applying SparseMixer to Switch Transformer on both pre-training and machine translation tasks, SparseMixer showcases considerable performance gain, accelerating training convergence up to 2 times.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.00811

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.82)

Add feedback

P4E: Few-Shot Event Detection as Prompt-Guided Identification and Localization

Li, Sha, Liu, Liyuan, Xie, Yiqing, Ji, Heng, Han, Jiawei

arXiv.org Artificial IntelligenceDec-19-2022

We propose P4E, an identify-and-localize event detection framework that integrates the best of few-shot prompting and structured prediction. Our framework decomposes event detection into an identification task and a localization task. For the identification task, which we formulate as multi-label classification, we leverage cloze-based prompting to align our objective with the pre-training task of language models, allowing our model to quickly adapt to new event types. We then employ an event type-agnostic sequence labeling model to localize the event trigger conditioned on the identification output. This heterogeneous model design allows P4E to quickly learn new event types without sacrificing the ability to make structured predictions. Our experiments demonstrate the effectiveness of our proposed design, and P4E shows superior performance for few-shot event detection on benchmark datasets FewEvent and MAVEN and comparable performance to SOTA for fully-supervised event detection on ACE.

artificial intelligence, event type, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2202.07615

Country:

Asia (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.55)

Add feedback