AITopics | Liang, Tao

Collaborating Authors

Liang, Tao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Route Sparse Autoencoder to Interpret Large Language Models

Shi, Wei, Li, Sihang, Liang, Tao, Wan, Mingyang, Ma, Gojun, Wang, Xiang, He, Xiangnan

arXiv.org Artificial IntelligenceMar-11-2025

Mechanistic interpretability of large language models (LLMs) aims to uncover the internal processes of information propagation and reasoning. Sparse autoencoders (SAEs) have demonstrated promise in this domain by extracting interpretable and monosemantic features. However, prior works primarily focus on feature extraction from a single layer, failing to effectively capture activations that span multiple layers. In this paper, we introduce Route Sparse Autoencoder (RouteSAE), a new framework that integrates a routing mechanism with a shared SAE to efficiently extract features from multiple layers. It dynamically assigns weights to activations from different layers, incurring minimal parameter overhead while achieving high interpretability and flexibility for targeted feature manipulation. We evaluate RouteSAE through extensive experiments on Llama-3.2-1B-Instruct. Specifically, under the same sparsity constraint of 64, RouteSAE extracts 22.5% more features than baseline SAEs while achieving a 22.3% higher interpretability score. These results underscore the potential of RouteSAE as a scalable and effective method for LLM interpretability, with applications in feature discovery and model intervention. Our codes are available at https://github.com/swei2001/RouteSAEs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.082

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Government > Foreign Policy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

Zhang, Ming, Wang, Yuhui, Shen, Yujiong, Yang, Tingyi, Jiang, Changhao, Wu, Yilong, Dou, Shihan, Chen, Qinhao, Xi, Zhiheng, Zhang, Zhihao, Dong, Yi, Wang, Zhen, Fei, Zhihui, Wan, Mingyang, Liang, Tao, Ma, Guojun, Zhang, Qi, Gui, Tao, Huang, Xuanjing

arXiv.org Artificial IntelligenceMar-9-2025

Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial) dataset, which contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each UML flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy. Additionally, the 8B model can surpass GPT-4o up to 43.88% with an average of 11.00%. We further evaluate models' performance on challenging backward transitions in process flows and conduct an in-depth analysis of various dataset formats to reveal their impact on model performance in handling decision and sequential branches. The data is released in https://github.com/KongLongGeFDU/PFDial.

accuracy, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.06706

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (0.48)

Industry:

Education (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training

Jiang, Changhao, Zhang, Ming, Ye, Junjie, Fan, Xiaoran, Cao, Yifei, Sun, Jiajun, Xi, Zhiheng, Dou, Shihan, Dong, Yi, Shen, Yujiong, Tong, Jingqi, Wang, Zhen, Liang, Tao, Fei, Zhihui, Wan, Mingyang, Ma, Guojun, Zhang, Qi, Gui, Tao, Huang, Xuanjing

arXiv.org Artificial IntelligenceFeb-6-2025

The GPT-4 technical report from OpenAI suggests that model performance on specific tasks can be predicted prior to training, though methodologies remain unspecified. This approach is crucial for optimizing resource allocation and ensuring data alignment with target tasks. To achieve this vision, we focus on predicting performance on Closed-book Question Answering (CBQA) tasks, which are closely tied to pre-training data and knowledge retention. We address three major challenges: 1) mastering the entire pre-training process, especially data construction; 2) evaluating a model's knowledge retention; and 3) predicting task-specific knowledge retention using only information available prior to training. To tackle these challenges, we pre-train three large language models (i.e., 1.6B, 7B, and 13B) using 560k dollars and 520k GPU hours. We analyze the pre-training data with knowledge triples and assess knowledge retention using established methods. Additionally, we introduce the SMI metric, an information-theoretic measure that quantifies the relationship between pre-training data, model size, and task-specific knowledge retention. Our experiments reveal a strong linear correlation ($\text{R}^2 > 0.84$) between the SMI metric and the model's accuracy on CBQA tasks across models of varying sizes (i.e., 1.1B, 1.6B, 7B, and 13B). The dataset, model, and code are available at https://github.com/yuhui1038/SMI.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.04066

Country:

Asia > Middle East > UAE (0.14)
North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neuron-Level Sequential Editing for Large Language Models

Jiang, Houcheng, Fang, Junfeng, Zhang, Tianyu, Zhang, An, Wang, Ruipeng, Liang, Tao, Wang, Xiang

arXiv.org Artificial IntelligenceOct-5-2024

This work explores sequential model editing in large language models (LLMs), a critical task that involves modifying internal knowledge within LLMs continuously through multi-round editing, each incorporating updates or corrections to adjust the model outputs without the need for costly retraining. Existing model editing methods, especially those that alter model parameters, typically focus on single-round editing and often face significant challenges in sequential model editing-most notably issues of model forgetting and failure. To address these challenges, we introduce a new model editing method, namely \textbf{N}euron-level \textbf{S}equential \textbf{E}diting (NSE), tailored for supporting sequential model editing. Specifically, we optimize the target layer's hidden states using the model's original weights to prevent model failure. Furthermore, we iteratively select neurons in multiple layers for editing based on their activation values to mitigate model forgetting. Our empirical experiments demonstrate that NSE significantly outperforms current modifying parameters model editing methods, marking a substantial advancement in the field of sequential model editing. Our code is released on \url{https://github.com/jianghoucheng/NSE}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.04045

Country:

Europe > Romania (0.17)
Europe > Germany (0.16)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Leisure & Entertainment (0.68)
Health & Medicine (0.48)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

LI-Net: Large-Pose Identity-Preserving Face Reenactment Network

Liu, Jin, Chen, Peng, Liang, Tao, Li, Zhaoxing, Yu, Cai, Zou, Shuqiao, Dai, Jiao, Han, Jizhong

arXiv.org Artificial IntelligenceApr-6-2021

Face reenactment is a challenging task, as it is difficult to maintain accurate expression, pose and identity simultaneously. Most existing methods directly apply driving facial landmarks to reenact source faces and ignore the intrinsic gap between two identities, resulting in the identity mismatch issue. Besides, they neglect the entanglement of expression and pose features when encoding driving faces, leading to inaccurate expressions and visual artifacts on large-pose reenacted faces. To address these problems, we propose a Large-pose Identity-preserving face reenactment network, LI-Net. Specifically, the Landmark Transformer is adopted to adjust driving landmark images, which aims to narrow the identity gap between driving and source landmark images. Then the Face Rotation Module and the Expression Enhancing Generator decouple the transformed landmark image into pose and expression features, and reenact those attributes separately to generate identity-preserving faces with accurate expressions and poses. Both qualitative and quantitative experimental results demonstrate the superiority of our method.

artificial intelligence, expression, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2104.0285

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.69)

Add feedback

Weakly-supervised Domain Adaption for Aspect Extraction via Multi-level Interaction Transfer

Liang, Tao, Wang, Wenya, Lv, Fengmao

arXiv.org Artificial IntelligenceJun-16-2020

Fine-grained aspect extraction is an essential sub-task in aspect based opinion analysis. It aims to identify the aspect terms (a.k.a. opinion targets) of a product or service in each sentence. However, expensive annotation process is usually involved to acquire sufficient token-level labels for each domain. To address this limitation, some previous works propose domain adaptation strategies to transfer knowledge from a sufficiently labeled source domain to unlabeled target domains. But due to both the difficulty of fine-grained prediction problems and the large domain gap between domains, the performance remains unsatisfactory. This work conducts a pioneer study on leveraging sentence-level aspect category labels that can be usually available in commercial services like review sites to promote token-level transfer for the extraction purpose. Specifically, the aspect category information is used to construct pivot knowledge for transfer with assumption that the interactions between sentence-level aspect category and token-level aspect terms are invariant across domains. To this end, we propose a novel multi-level reconstruction mechanism that aligns both the fine-grained and coarse-grained information in multiple levels of abstractions. Comprehensive experiments demonstrate that our approach can fully utilize sentence-level aspect category labels to improve cross-domain aspect extraction with a large performance gain.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2006.09235

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.69)

Add feedback