AITopics | He, Shizhu

Collaborating Authors

He, Shizhu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Comprehensive Survey on Long Context Language Modeling

Liu, Jiaheng, Zhu, Dawei, Bai, Zhiqi, He, Yancheng, Liao, Huanxuan, Que, Haoran, Wang, Zekun, Zhang, Chenchen, Zhang, Ge, Zhang, Jiebin, Zhang, Yuanxing, Chen, Zhuo, Guo, Hangyu, Li, Shilong, Liu, Ziqiang, Shan, Yong, Song, Yifan, Tian, Jiayi, Wu, Wenhao, Zhou, Zhejian, Zhu, Ruijie, Feng, Junlan, Gao, Yang, He, Shizhu, Li, Zhoujun, Liu, Tianyu, Meng, Fanyu, Su, Wenbo, Tan, Yingshui, Wang, Zili, Yang, Jian, Ye, Wei, Zheng, Bo, Zhou, Wangchunshu, Huang, Wenhao, Li, Sujian, Zhang, Zhaoxiang

arXiv.org Artificial IntelligenceMar-20-2025

Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: \href{https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling}{\color[RGB]{175,36,67}{LCLM-Horizon}}.

information retrieval, large language model, machine learning, (26 more...)

arXiv.org Artificial Intelligence

2503.17407

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)
Research Report > Experimental Study (0.45)

Industry:

Health & Medicine (1.00)
Education (1.00)
Information Technology (0.92)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks

Luo, Jianwen, Huang, Yiming, Meng, Jinxiang, Lei, Fangyu, He, Shizhu, Liu, Xiao, Jiang, Shanshan, Dong, Bin, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceFeb-20-2025

Large Language Models (LLMs) have shown great promise in tool-making, yet existing frameworks often struggle to efficiently construct reliable toolsets and are limited to single-task settings. To address these challenges, we propose GATE (Graph-based Adaptive Tool Evolution), an adaptive framework that dynamically constructs and evolves a hierarchical graph of reusable tools across multiple scenarios. We evaluate GATE on open-ended tasks (Minecraft), agent-based tasks (TextCraft, DABench), and code generation tasks (MATH, Date, TabMWP). Our results show that GATE achieves up to 4.3x faster milestone completion in Minecraft compared to the previous SOTA, and provides an average improvement of 9.23% over existing tool-making methods in code generation tasks and 10.03% in agent tasks. GATE demonstrates the power of adaptive evolution, balancing tool quantity, complexity, and functionality while maintaining high efficiency. Code and data are available at \url{https://github.com/ayanami2003/GATE}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.14848

Country:

Asia (0.28)
Europe > Sweden (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Materials > Metals & Mining (1.00)
Education (0.92)
Leisure & Entertainment > Games > Computer Games (0.69)
Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DATA: Decomposed Attention-based Task Adaptation for Rehearsal-Free Continual Learning

Liao, Huanxuan, He, Shizhu, Hao, Yupu, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceFeb-17-2025

Continual learning (CL) is essential for Large Language Models (LLMs) to adapt to evolving real-world demands, yet they are susceptible to catastrophic forgetting (CF). While traditional CF solutions rely on expensive data rehearsal, recent rehearsal-free methods employ model-based and regularization-based strategies to address this issue. However, these approaches often neglect the model's plasticity, which is crucial to achieving optimal performance on newly learned tasks. Consequently, a key challenge in CL is striking a balance between preserving plasticity and mitigating CF. To tackle this challenge, we propose the $\textbf{D}$ecomposed $\textbf{A}$ttention-based $\textbf{T}$ask $\textbf{A}$daptation (DATA), which explicitly decouples and learns both task-specific and task-shared knowledge using high-rank and low-rank task adapters (e.g., LoRAs). For new tasks, DATA dynamically adjusts the weights of adapters of different ranks based on their relevance and distinction from previous tasks, allowing the model to acquire new task-specific skills while effectively retaining previously learned knowledge. Specifically, we implement a decomposed component weighting strategy comprising learnable components that collectively generate attention-based weights, allowing the model to integrate and utilize diverse knowledge from each DATA. Extensive experiments on three widely used benchmarks demonstrate that our proposed method achieves state-of-the-art performance. Notably, our approach significantly enhances model plasticity and mitigates CF by extending learnable components and employing stochastic restoration during training iterations.

decomposed attention-based task adaptation, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2502.11482

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

VaiBot: Shuttle Between the Instructions and Parameters of Large Language Models

Sun, Wangtao, Xu, Haotian, Liao, Huanxuan, Yu, Xuanqing, Jiang, Zhongtao, He, Shizhu, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceFeb-12-2025

How to interact with LLMs through \emph{instructions} has been widely studied by researchers. However, previous studies have treated the emergence of instructions and the training of LLMs on task data as separate processes, overlooking the inherent unity between the two. This paper proposes a neural network framework, VaiBot, that integrates VAE and VIB, designed to uniformly model, learn, and infer both deduction and induction tasks under LLMs. Through experiments, we demonstrate that VaiBot performs on par with existing baseline methods in terms of deductive capabilities while significantly surpassing them in inductive capabilities. We also find that VaiBot can scale up using general instruction-following data and exhibits excellent one-shot induction abilities. We finally synergistically integrate the deductive and inductive processes of VaiBot. Through T-SNE dimensionality reduction, we observe that its inductive-deductive process significantly improves the distribution of training parameters, enabling it to outperform baseline methods in inductive reasoning tasks. The code and data for this paper can be found at https://anonymous.4open.science/r/VaiBot-021F.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.02315

Country:

North America (0.28)
Asia > China (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

$\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models

Liao, Huanxuan, He, Shizhu, Hao, Yupu, Li, Xiang, Zhang, Yuanzhe, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceDec-14-2024

Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns of Large Language Models (LLMs). Some studies fine-tune SLMs using Chains of Thought (CoT) data distilled from LLMs, aiming to enhance their reasoning ability. Furthermore, Some CoT distillation methods introduce external symbolic knowledge into the generation process to improve the limited knowledge memory, reasoning ability and out-of-domain (OOD) generalization of SLMs. However, the introduction of symbolic knowledge increases computational overhead and introduces potential noise. In this paper, we introduce $\textit{SKIntern}$, an innovative approach that empowers SLMs to internalize symbolic knowledge and few-shot examples gradually through a progressive fine-tuning process, guided by a predefined linear decay schedule under curriculum learning. By efficiently internalizing knowledge, $\textit{SKIntern}$ reduces computational overhead and speeds up the reasoning process by focusing solely on the question during inference. It outperforms state-of-the-art baselines by over 5\%, while reducing inference costs (measured in FLOPs) by up to $4\times$ across a wide range of SLMs in both in-domain (ID) and out-of-domain (OOD) tasks. Our code will be available at \url{https://github.com/Xnhyacinth/SKIntern}.

knowledge management, large language model, machine learning, (24 more...)

arXiv.org Artificial Intelligence

2409.13183

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Education (0.68)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks

Liao, Huanxuan, He, Shizhu, Xu, Yao, Zhang, Yuanzhe, Liu, Kang, Zhao, Jun

arXiv.org Artificial IntelligenceDec-14-2024

In this paper, we propose $\textbf{Ne}$ural-$\textbf{Sy}$mbolic $\textbf{C}$ollaborative $\textbf{D}$istillation ($\textbf{NesyCD}$), a novel knowledge distillation method for learning the complex reasoning abilities of Large Language Models (LLMs, e.g., \textgreater 13B). We argue that complex reasoning tasks are difficult for Small Language Models (SLMs, e.g., $\leq$ 7B), as these tasks demand not only general cognitive abilities but also specialized knowledge, which is often sparse and difficult for these neural-based SLMs to effectively capture. Therefore, NesyCD distills the general capabilities and specialized knowledge in LLMs using different manners. On the one hand, we distill only general abilities from teacher LLMs into the student SLMs of parameterized neural networks. On the other hand, for the specialized abilities and uncommon knowledge of a complex reasoning task, we employ a symbolic knowledge distillation approach to obtain and store the specialized knowledge within a symbolic knowledge base (KB). By decoupling general and specialized capabilities, the proposed NesyCD can achieve superior performance cost-effectively, utilizing smaller models and blending parameterized neural networks with symbolic KB. Moreover, the specialized KB generalizes well and is comprehended and manipulated by humans. Our experiments show that NesyCD significantly boosts SLMs' complex reasoning performance on in-domain (BBH, GSM8K) and out-of-domain (AGIEval, ARC) datasets. Notably, our approach enabled the LLaMA3-8B and Qwen2-7B to surpass GPT-3.5-turbo in performance and come close to matching LLaMA3-70B, despite the latter having nine times more parameters. Our code will be available at https://github.com/Xnhyacinth/NesyCD.

knowledge management, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2409.13203

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.54)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLaSA: Large Language and Structured Data Assistant

Xu, Yao, He, Shizhu, Xiangrong, Zeng, Chen, Jiabei, Liu, Guang, Wang, Bingning, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceNov-16-2024

Structured data, such as tables, graphs, and databases, play a critical role in plentiful NLP tasks such as question answering and dialogue system. Recently, inspired by Vision-Language Models, Graph Neutral Networks (GNNs) have been introduced as an additional modality into the input of Large Language Models (LLMs) to improve their performance on Structured Knowledge Grounding (SKG) tasks. However, those GNN-enhanced LLMs have the following limitations: (1) They employ diverse GNNs to model varying types of structured data, rendering them unable to uniformly process various forms of structured data. (2) The pretraining of GNNs is coupled with specific LLMs, which prevents GNNs from fully aligning with the textual space and limits their adaptability to other LLMs. To address these issues, we propose \textbf{L}arge \textbf{L}anguage and \textbf{S}tructured Data \textbf{A}ssistant (LLaSA), a general framework for enhancing LLMs' ability to handle structured data. Specifically, we represent various types of structured data in a unified hypergraph format, and use self-supervised learning to pretrain a hypergraph encoder, and a G-Former compressing encoded hypergraph representations with cross-attention. The compressed hypergraph representations are appended to the serialized inputs during training and inference stages of LLMs. Experimental results on multiple SKG tasks show that our pretrained hypergraph encoder can adapt to various LLMs and enhance their ability to process different types of structured data. Besides, LLaSA, with LoRA fine-tuning, outperforms previous SOTA method using full parameters tuning.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.1446

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Huang, Yiming, Luo, Jianwen, Yu, Yan, Zhang, Yitong, Lei, Fangyu, Wei, Yifan, He, Shizhu, Huang, Lifu, Liu, Xiao, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceOct-10-2024

We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We develop the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at https://da-code-bench.github.io.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.07331

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.67)
Transportation > Electric Vehicle (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Beyond Instruction Following: Evaluating Rule Following of Large Language Models

Sun, Wangtao, Zhang, Chenxiang, Zhang, Xueyou, Huang, Ziyang, Xu, Haotian, Chen, Pei, He, Shizhu, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceJul-11-2024

Although Large Language Models (LLMs) have demonstrated strong instruction-following ability to be helpful, they are further supposed to be controlled and guided by rules in real-world scenarios to be safe, and accurate in responses. This demands the possession of rule-following capability of LLMs. However, few works have made a clear evaluation of the rule-following capability of LLMs. Previous studies that try to evaluate the rule-following capability of LLMs fail to distinguish the rule-following scenarios from the instruction-following scenarios. Therefore, this paper first makes a clarification of the concept of rule-following, and curates a comprehensive benchmark, RuleBench, to evaluate a diversified range of rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our further analysis provides insights into the improvements for LLMs toward a better rule-following intelligent agent. The data and code can be found at: https://anonymous.4open.science/r/llm-rule-following-B3E3/

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2407.0844

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints

Song, Ran, He, Shizhu, Gao, Shengxiang, Cai, Li, Liu, Kang, Yu, Zhengtao, Zhao, Jun

arXiv.org Artificial IntelligenceJun-26-2024

Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities, while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC.

artificial intelligence, knowledge, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.18085

Country:

Europe (1.00)
South America > Brazil (0.68)
Asia > China (0.47)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback