AITopics | Li, Zhongyang

Plotting

Li, Zhongyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EventWeave: A Dynamic Framework for Capturing Core and Supporting Events in Dialogue Systems

Zhao, Zhengyi, Zhang, Shubo, Du, Yiming, Liang, Bin, Wang, Baojun, Li, Zhongyang, Li, Binyang, Wong, Kam-Fai

arXiv.org Artificial IntelligenceMar-29-2025

Existing large language models (LLMs) have shown remarkable progress in dialogue systems. However, many approaches still overlook the fundamental role of events throughout multi-turn interactions, leading to \textbf{incomplete context tracking}. Without tracking these events, dialogue systems often lose coherence and miss subtle shifts in user intent, causing disjointed responses. To bridge this gap, we present \textbf{EventWeave}, an event-centric framework that identifies and updates both core and supporting events as the conversation unfolds. Specifically, we organize these events into a dynamic event graph, which represents the interplay between \textbf{core events} that shape the primary idea and \textbf{supporting events} that provide critical context during the whole dialogue. By leveraging this dynamic graph, EventWeave helps models focus on the most relevant events when generating responses, thus avoiding repeated visits of the entire dialogue history. Experimental results on two benchmark datasets show that EventWeave improves response quality and event relevance without fine-tuning.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.23078

Country:

Asia (0.46)
North America (0.46)

Genre:

Research Report (1.00)
Personal > Interview (0.67)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)
Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Survey on Transformer Context Extension: Approaches and Evaluation

Liu, Yijun, Yu, Jinzheng, Xu, Yang, Li, Zhongyang, Zhu, Qingfu

arXiv.org Artificial IntelligenceMar-17-2025

Large language models (LLMs) based on Transformer have been widely applied in the filed of natural language processing (NLP), demonstrating strong performance, particularly in handling short text tasks. However, when it comes to long context scenarios, the performance of LLMs degrades due to some challenges. To alleviate this phenomenon, there is a number of work proposed recently. In this survey, we first list the challenges of applying pre-trained LLMs to process long contexts. Then systematically review the approaches related to long context and propose our taxonomy categorizing them into four main types: positional encoding, context compression, retrieval augmented, and attention pattern. In addition to the approaches, we focus on the evaluation of long context, organizing relevant data, tasks, and metrics based on existing long context benchmarks. Finally, we summarize unresolved issues in the long context domain and put forward our views on future developments.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.13299

Country:

Asia > Thailand (0.14)
Asia > China (0.14)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Li, Zhongyang, Li, Ziyue, Zhou, Tianyi

arXiv.org Artificial IntelligenceFeb-28-2025

In large multimodal models (LMMs), the perception of non-language modalities (e.g., visual representations) is usually not on par with the large language models (LLMs)' powerful reasoning capabilities, deterring LMMs' performance on challenging downstream tasks. This weakness has been recently mitigated by replacing the vision encoder with a mixture-of-experts (MoE), which provides rich, multi-granularity, and diverse representations required by diverse downstream tasks. The performance of multimodal MoE largely depends on its router, which reweights and mixes the representations of different experts for each input. However, we find that the end-to-end trained router does not always produce the optimal routing weights for every test sample. To bridge the gap, we propose a novel and efficient method "Re-Routing in Test-Time (R2-T2)" that locally optimizes the vector of routing weights in test-time by moving it toward those vectors of the correctly predicted samples in a neighborhood of the test sample. We propose three R2-T2 strategies with different optimization objectives and neighbor-search spaces. R2-T2 consistently and greatly improves state-of-the-art LMMs' performance on challenging benchmarks of diverse tasks, without training any base-model parameters.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.20395

Country:

Europe (0.28)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
(2 more...)

Add feedback

A Survey of Personalized Large Language Models: Progress and Future Directions

Liu, Jiahong, Qiu, Zexuan, Li, Zhongyang, Dai, Quanyu, Zhu, Jieming, Hu, Minda, Yang, Menglin, King, Irwin

arXiv.org Artificial IntelligenceFeb-17-2025

Large Language Models (LLMs) excel in handling general knowledge tasks, yet they struggle with user-specific personalization, such as understanding individual emotions, writing styles, and preferences. Personalized Large Language Models (PLLMs) tackle these challenges by leveraging individual user data, such as user profiles, historical dialogues, content, and interactions, to deliver responses that are contextually relevant and tailored to each user's specific needs. This is a highly valuable research topic, as PLLMs can significantly enhance user satisfaction and have broad applications in conversational agents, recommendation systems, emotion recognition, medical assistants, and more. This survey reviews recent advancements in PLLMs from three technical perspectives: prompting for personalized context (input level), finetuning for personalized adapters (model level), and alignment for personalized preferences (objective level). To provide deeper insights, we also discuss current limitations and outline several promising directions for future research. Updated information about this survey can be found at the https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.11528

Country: Asia (0.28)

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Crafting Personalized Agents through Retrieval-Augmented Generation on Editable Memory Graphs

Wang, Zheng, Li, Zhongyang, Jiang, Zeren, Tu, Dandan, Shi, Wei

arXiv.org Artificial IntelligenceSep-28-2024

In the age of mobile internet, user data, often referred to as memories, is continuously generated on personal devices. Effectively managing and utilizing this data to deliver services to users is a compelling research topic. In this paper, we introduce a novel task of crafting personalized agents powered by large language models (LLMs), which utilize a user's smartphone memories to enhance downstream applications with advanced LLM capabilities. To achieve this goal, we introduce EMG-RAG, a solution that combines Retrieval-Augmented Generation (RAG) techniques with an Editable Memory Graph (EMG). This approach is further optimized using Reinforcement Learning to address three distinct challenges: data collection, editability, and selectability. Extensive experiments on a real-world dataset validate the effectiveness of EMG-RAG, achieving an improvement of approximately 10% over the best existing approach. Additionally, the personalized agents have been transferred into a real smartphone AI assistant, which leads to enhanced usability.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.19401

Country: Asia > China > Guangdong Province (0.29)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment (1.00)
Education (0.93)
Transportation (0.68)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Concise and Precise Context Compression for Tool-Using Language Models

Xu, Yang, Feng, Yunlong, Mu, Honglin, Hou, Yutai, Li, Yitong, Wang, Xinghao, Zhong, Wanjun, Li, Zhongyang, Tu, Dandan, Zhu, Qingfu, Zhang, Min, Che, Wanxiang

arXiv.org Artificial IntelligenceJul-2-2024

Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suitable approach to alleviate the problem. However, when compressing tool documentation, existing methods suffer from the weaknesses of key information loss (specifically, tool/parameter name errors) and difficulty in adjusting the length of compressed sequences based on documentation lengths. To address these problems, we propose two strategies for compressing tool documentation into concise and precise summary sequences for tool-using language models. 1) Selective compression strategy mitigates key information loss by deliberately retaining key information as raw text tokens. 2) Block compression strategy involves dividing tool documentation into short chunks and then employing a fixed-length compression model to achieve variable-length compression. This strategy facilitates the flexible adjustment of the compression ratio. Results on API-Bank and APIBench show that our approach reaches a performance comparable to the upper-bound baseline under up to 16x compression ratio.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.02043

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Improving Cross-Task Generalization with Step-by-Step Instructions

Wu, Yang, Zhao, Yanyan, Li, Zhongyang, Qin, Bing, Xiong, Kai

arXiv.org Artificial IntelligenceMay-7-2023

Instruction tuning has been shown to be able to improve cross-task generalization of language models. However, it is still challenging for language models to complete the target tasks following the instructions, as the instructions are general and lack intermediate steps. To address this problem, we propose to incorporate the step-by-step instructions to help language models to decompose the tasks, which can provide the detailed and specific procedures for completing the target tasks. The step-by-step instructions are obtained automatically by prompting ChatGPT, which are further combined with the original instructions to tune language models. The extensive experiments on SUP-NATINST show that the high-quality step-by-step instructions can improve cross-task generalization across different model sizes. Moreover, the further analysis indicates the importance of the order of steps of the step-by-step instruction for the improvement. To facilitate future research, we release the step-by-step instructions and their human quality evaluation results.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.04429

Country: Europe (0.28)

Genre:

Workflow (1.00)
Instructional Material > Training Manual (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

ReCo: Reliable Causal Chain Reasoning via Structural Causal Recurrent Neural Networks

Xiong, Kai, Ding, Xiao, Li, Zhongyang, Du, Li, Qin, Bing, Zheng, Yi, Huai, Baoxing

arXiv.org Artificial IntelligenceDec-16-2022

Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario. To address these issues, we propose a novel Reliable Causal chain reasoning framework~(ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks~(SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge.

causal chain, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.08322

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Exact Recovery of Community Detection in dependent Gaussian Mixture Models

Li, Zhongyang, Yang, Sichen

arXiv.org Machine LearningSep-23-2022

We study the community detection problem on a Gaussian mixture model, in which (1) vertices are divided into $k\geq 2$ distinct communities that are not necessarily equally-sized; (2) the Gaussian perturbations for different entries in the observation matrix are not necessarily independent or identically distributed. We prove necessary and sufficient conditions for the exact recovery of the maximum likelihood estimation (MLE), and discuss the cases when these necessary and sufficient conditions give sharp threshold. Applications include the community detection on a graph where the Gaussian perturbations of observations on each edge is the sum of i.i.d.~Gaussian random variables on its end vertices, in which we explicitly obtain the threshold for the exact recovery of the MLE.

artificial intelligence, machine learning, vertex, (15 more...)

arXiv.org Machine Learning

2209.14859

Country: North America > United States > Connecticut (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

On the coercivity condition in the learning of interacting particle systems

Li, Zhongyang, Lu, Fei

arXiv.org Machine LearningNov-20-2020

In the learning of systems of interacting particles or agents, coercivity condition ensures identifiability of the interaction functions, providing the foundation of learning by nonparametric regression. The coercivity condition is equivalent to the strictly positive definiteness of an integral kernel arising in the learning. We show that for a class of interaction functions such that the system is ergodic, the integral kernel is strictly positive definite, and hence the coercivity condition holds true.

artificial intelligence, kernel, stationary density, (15 more...)

arXiv.org Machine Learning

2011.1048

Country: North America > United States > Connecticut > Tolland County > Storrs (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback