AITopics | Li, Zhixu

Collaborating Authors

Li, Zhixu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance

Huang, Chenghua, Wang, Lu, Yang, Fangkai, Zhao, Pu, Li, Zhixu, Lin, Qingwei, Zhang, Dongmei, Rajmohan, Saravan, Zhang, Qi

arXiv.org Artificial IntelligenceFeb-24-2025

Proximal Policy Optimization (PPO)-based Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human preferences. It requires joint training of an actor and critic with a pretrained, fixed reward model for guidance. This approach increases computational complexity and instability due to actor-critic interdependence. Additionally, PPO lacks access to true environment rewards in LLM tasks, limiting its adaptability. Under such conditions, pretraining a value model or a reward model becomes equivalent, as both provide fixed supervisory signals without new ground-truth feedback. To address these issues, we propose \textbf{Decoupled Value Policy Optimization (DVPO)}, a lean framework that replaces traditional reward modeling with a pretrained \emph{global value model (GVM)}. The GVM is conditioned on policy trajectories and predicts token-level return-to-go estimates. By decoupling value model from policy training (via frozen GVM-driven RL objectives), DVPO eliminates actor-critic interdependence, reducing GPU memory usage by 40\% and training time by 35\% compared to conventional RLHF. Experiments across benchmarks show DVPO outperforms efficient RLHF methods (e.g., DPO) while matching state-of-the-art PPO in performance.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.16944

Country: Asia (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Zhu, Xiangru, Sun, Penglei, Song, Yaoxian, Xiao, Yanghua, Li, Zhixu, Wang, Chengyu, Huang, Jun, Yang, Bei, Xu, Xiaoxiao

arXiv.org Artificial IntelligenceOct-18-2024

Accurate interpretation and visualization of human instructions are crucial for text-to-image (T2I) synthesis. However, current models struggle to capture semantic variations from word order changes, and existing evaluations, relying on indirect metrics like text-image similarity, fail to reliably assess these challenges. This often obscures poor performance on complex or uncommon linguistic patterns by the focus on frequent word combinations. To address these deficiencies, we propose a novel metric called SemVarEffect and a benchmark named SemVarBench, designed to evaluate the causality between semantic variations in inputs and outputs in T2I synthesis. Semantic variations are achieved through two types of linguistic permutations, while avoiding easily predictable literal variations. Experiments reveal that the CogView-3-Plus and Ideogram 2 performed the best, achieving a score of 0.2/1. Semantic variations in object relations are less understood than attributes, scoring 0.07/1 compared to 0.17-0.19/1. We found that cross-modal alignment in UNet or Transformers plays a crucial role in handling semantic variations, a factor previously overlooked by a focus on textual encoders. Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding. Our benchmark and code are available at https://github.com/zhuxiangru/SemVarBench .

large language model, machine learning, semantic variation, (20 more...)

arXiv.org Artificial Intelligence

2410.10291

Country: Asia > China (0.27)

Genre: Research Report > New Finding (0.67)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

HOTVCOM: Generating Buzzworthy Comments for Videos

Chen, Yuyan, Qian, Yiwen, Yan, Songzhou, Jia, Jiyuan, Li, Zhixu, Xiao, Yanghua, Li, Xiaobo, Yang, Ming, Guo, Qingpei

arXiv.org Artificial IntelligenceSep-23-2024

In the era of social media video platforms, popular ``hot-comments'' play a crucial role in attracting user impressions of short-form videos, making them vital for marketing and branding purpose. However, existing research predominantly focuses on generating descriptive comments or ``danmaku'' in English, offering immediate reactions to specific video moments. Addressing this gap, our study introduces \textsc{HotVCom}, the largest Chinese video hot-comment dataset, comprising 94k diverse videos and 137 million comments. We also present the \texttt{ComHeat} framework, which synergistically integrates visual, auditory, and textual data to generate influential hot-comments on the Chinese video dataset. Empirical evaluations highlight the effectiveness of our framework, demonstrating its excellence on both the newly constructed and existing datasets.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.15196

Country:

Asia > China (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Can Pre-trained Language Models Understand Chinese Humor?

Chen, Yuyan, Li, Zhixu, Liang, Jiaqing, Xiao, Yanghua, Liu, Bang, Chen, Yunwen

arXiv.org Artificial IntelligenceJul-4-2024

Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, these simple attempts do not substantially answer the question: {\em whether PLMs are capable of humor understanding?} This paper is the first work that systematically investigates the humor understanding ability of PLMs. For this purpose, a comprehensive framework with three evaluation steps and four evaluation tasks is designed. We also construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework. Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation.

machine learning, natural language, plm, (16 more...)

arXiv.org Artificial Intelligence

2407.04105

Country:

Europe (0.93)
Asia (0.73)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models

Chen, Yuyan, Fu, Qiang, Yuan, Yichen, Wen, Zhihao, Fan, Ge, Liu, Dayiheng, Zhang, Dongmei, Li, Zhixu, Xiao, Yanghua

arXiv.org Artificial IntelligenceJul-4-2024

Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks, including question answering and dialogue systems. However, a major drawback of LLMs is the issue of hallucination, where they generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. In this paper, we propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers. RelD is trained on the constructed RelQA, a bilingual question-answering dialogue dataset along with answers generated by LLMs and a comprehensive set of metrics. Our experimental results demonstrate that the proposed RelD successfully detects hallucination in the answers generated by diverse LLMs. Moreover, it performs well in distinguishing hallucination in LLMs' generated answers from both in-distribution and out-of-distribution datasets. Additionally, we also conduct a thorough analysis of the types of hallucinations that occur and present valuable insights. This research significantly contributes to the detection of reliable answers generated by LLMs and holds noteworthy implications for mitigating hallucination in the future work.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.04121

Country:

Europe (0.69)
Asia > China (0.69)
North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization

Chen, Yuyan, Wen, Zhihao, Fan, Ge, Chen, Zhengyu, Wu, Wei, Liu, Dayiheng, Li, Zhixu, Liu, Bang, Xiao, Yanghua

arXiv.org Artificial IntelligenceJul-4-2024

Prompt engineering, as an efficient and effective way to leverage Large Language Models (LLM), has drawn a lot of attention from the research community. The existing research primarily emphasizes the importance of adapting prompts to specific tasks, rather than specific LLMs. However, a good prompt is not solely defined by its wording, but also binds to the nature of the LLM in question. In this work, we first quantitatively demonstrate that different prompts should be adapted to different LLMs to enhance their capabilities across various downstream tasks in NLP. Then we novelly propose a model-adaptive prompt optimizer (MAPO) method that optimizes the original prompts for each specific LLM in downstream tasks. Extensive experiments indicate that the proposed method can effectively refine prompts for an LLM, leading to significant improvements over various downstream tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.04118

Country:

Asia (0.67)
Europe > Poland (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

Zhao, Haiquan, Li, Lingyu, Chen, Shisong, Kong, Shuqi, Wang, Jiaan, Huang, Kexin, Gu, Tianle, Wang, Yixu, Liang, Dandan, Li, Zhixu, Teng, Yan, Xiao, Yanghua, Wang, Yingchun

arXiv.org Artificial IntelligenceJun-24-2024

Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of role-playing agents, we propose an ESC Evaluation framework (ESC-Eval), which uses a role-playing agent to interact with ESC models, followed by a manual evaluation of the interactive dialogues. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we systematically conduct experiments using 14 LLMs as the ESC models, including general AI-assistant LLMs (ChatGPT) and ESC-oriented LLMs (ExTES-Llama). We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models. The results show that ESC-oriented LLMs exhibit superior ESC abilities compared to general AI-assistant LLMs, but there is still a gap behind human performance. Moreover, to automate the scoring process for future ESC models, we developed ESC-RANK, which trained on the annotated data, achieving a scoring performance surpassing 35 points of GPT-4. Our data and code are available at https://github.com/haidequanbu/ESC-Eval.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.14952

Country:

Asia (0.14)
Europe > Spain (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.67)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

Add feedback

SQLFixAgent: Towards Semantic-Accurate SQL Generation via Multi-Agent Collaboration

Cen, Jipeng, Liu, Jiaxin, Li, Zhixu, Wang, Jingjing

arXiv.org Artificial IntelligenceJun-19-2024

While fine-tuned large language models (LLMs) excel in generating grammatically valid SQL in Text-to-SQL parsing, they often struggle to ensure semantic accuracy in queries, leading to user confusion and diminished system usability. To tackle this challenge, we introduce SQLFixAgent, an innovative multi-agent collaborative framework designed for detecting and repairing erroneous SQL. Our framework comprises a core agent, SQLRefiner, alongside two auxiliary agents: SQLReviewer and QueryCrafter. The SQLReviewer agent employs the rubber duck debugging method to identify potential semantic mismatches between SQL statement and user query. If the error is detected, the QueryCrafter agent generates multiple SQL statements as candidate repairs using a fine-tuned SQLTool. Subsequently, leveraging similar repair retrieval and failure memory reflexion, the SQLRefiner agent selects the most fitting SQL statement from the candidates as the final repair. We evaluated our proposed framework on five Text-to-SQL benchmarks. The experimental results show that our method consistently enhances the performance of the baseline model, specifically achieving an execution accuracy improvement of over 3\% on the Bird benchmark. Our framework also has a higher token efficiency compared to other advanced methods, making it more competitive.

large language model, machine learning, sqlfixagent, (18 more...)

arXiv.org Artificial Intelligence

2406.13408

Country:

Europe > Italy (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Education (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

Huang, Wenhao, Peng, Chenghao, Li, Zhixu, Liang, Jiaqing, Xiao, Yanghua, Wen, Liqian, Chen, Zulong

arXiv.org Artificial IntelligenceApr-19-2024

Web automation is a significant technique that accomplishes complicated web tasks by automating common web actions, enhancing operational efficiency, and reducing the need for manual intervention. Traditional methods, such as wrappers, suffer from limited adaptability and scalability when faced with a new website. On the other hand, generative agents empowered by large language models (LLMs) exhibit poor performance and reusability in open-world scenarios. In this work, we introduce a crawler generation task for vertical information web pages and the paradigm of combining LLMs with crawlers, which helps crawlers handle diverse and changing web environments more efficiently. We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding. Through top-down and step-back operations, AutoCrawler can learn from erroneous actions and continuously prune HTML for better action generation. We conduct comprehensive experiments with multiple LLMs and demonstrate the effectiveness of our framework. Resources of this paper can be found at \url{https://github.com/EZ-hwh/AutoCrawler}

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.12753

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Media (0.68)
Leisure & Entertainment > Sports > Basketball (0.46)
Information Technology > Services (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Is There a One-Model-Fits-All Approach to Information Extraction? Revisiting Task Definition Biases

Huang, Wenhao, He, Qianyu, Li, Zhixu, Liang, Jiaqing, Xiao, Yanghua

arXiv.org Artificial IntelligenceMar-24-2024

Definition bias is a negative phenomenon that can mislead models. Definition bias in information extraction appears not only across datasets from different domains but also within datasets sharing the same domain. We identify two types of definition bias in IE: bias among information extraction datasets and bias between information extraction datasets and instruction tuning datasets. To systematically investigate definition bias, we conduct three probing experiments to quantitatively analyze it and discover the limitations of unified information extraction and large language models in solving definition bias. To mitigate definition bias in information extraction, we propose a multi-stage framework consisting of definition bias measurement, bias-aware fine-tuning, and task-specific bias mitigation. Experimental results demonstrate the effectiveness of our framework in addressing definition bias. Resources of this paper can be found at https://github.com/EZ-hwh/definition-bias

definition bias, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.16396

Country:

Asia > China (0.28)
North America > United States > Alaska (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Materials > Chemicals (0.46)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Data Science > Data Mining > Text Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback