AITopics | Liu, Weiwen

Collaborating Authors

Liu, Weiwen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptive Tool Use in Large Language Models with Meta-Cognition Trigger

Li, Wenjun, Li, Dexun, Dong, Kuicai, Zhang, Cong, Zhang, Hao, Liu, Weiwen, Wang, Yasheng, Tang, Ruiming, Liu, Yong

arXiv.org Artificial IntelligenceFeb-18-2025

Large language models (LLMs) have shown remarkable emergent capabilities, transforming the execution of functional tasks by leveraging external tools for complex problems that require specialized processing or real-time data. While existing research expands LLMs access to diverse tools (e.g., program interpreters, search engines, weather/map apps), the necessity of using these tools is often overlooked, leading to indiscriminate tool invocation. This naive approach raises two key issues:(1) increased delays due to unnecessary tool calls, and (2) potential errors resulting from faulty interactions with external tools. In this paper, we introduce meta-cognition as a proxy for LLMs self-assessment of their capabilities, representing the model's awareness of its own limitations. Based on this, we propose MeCo, an adaptive decision-making strategy for external tool use. MeCo quantifies metacognitive scores by capturing high-level cognitive signals in the representation space, guiding when to invoke tools. Notably, MeCo is fine-tuning-free and incurs minimal cost. Our experiments show that MeCo accurately detects LLMs' internal cognitive signals and significantly improves tool-use decision-making across multiple base models and benchmarks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.12961

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

iTool: Boosting Tool Use of Large Language Models via Iterative Reinforced Fine-Tuning

Zeng, Yirong, Ding, Xiao, Wang, Yuxian, Liu, Weiwen, Ning, Wu, Hou, Yutai, Huang, Xu, Qin, Bing, Liu, Ting

arXiv.org Artificial IntelligenceFeb-16-2025

Augmenting large language models (LLMs) with external tools is known as a promising approach to enhancing their capabilities, especially for complex tasks. Synthesizing tool-use data through real-world simulations is an effective way to achieve it. Nevertheless, our investigation reveals that (1) training gains significantly decay as synthetic data increases. The model struggles to benefit from more synthetic data due to potential data diversity issues, resulting in poor performance in complex scenarios. Moreover, we find that (2) this challenge primarily manifests as minor discrepancies between the model's output and the ground truth response (termed as deficiency), such as errors in parameter values that require complex reasoning from the context to resolve. To this end, we propose an iterative reinforced fine-tuning strategy designed to alleviate these challenges. This strategy involves: (1) enhancing the diversity of synthetic data through path exploration of Monte Carlo Tree Search. (2) iteratively identifying deficiency-related data, constructing fine-grained preference pairs to pinpoint deficiencies, and then applying preference optimization to optimize these deficiencies. Our experiments show that models trained using our method achieve about 3\% better performance than same-size models, outperforming larger open-source and closed-source models.

large language model, machine learning, preprint arxiv, (19 more...)

arXiv.org Artificial Intelligence

2501.09766

Country:

Asia (0.68)
North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ACEBench: Who Wins the Match Point in Tool Usage?

Chen, Chen, Hao, Xinlong, Liu, Weiwen, Huang, Xu, Zeng, Xingshan, Yu, Shuai, Li, Dexun, Wang, Shuai, Gan, Weinan, Huang, Yuefeng, Liu, Wulong, Wang, Xinzhi, Lian, Defu, Yin, Baoqun, Wang, Yasheng, Liu, Wu

arXiv.org Artificial IntelligenceFeb-13-2025

Large Language Models (LLMs) have demonstrated significant potential in decision-making and reasoning, particularly when integrated with various tools to effectively solve complex problems. However, existing benchmarks for evaluating LLMs' tool usage face several limitations: (1) limited evaluation scenarios, often lacking assessments in real multi-turn dialogue contexts; (2) narrow evaluation dimensions, with insufficient detailed assessments of how LLMs use tools; and (3) reliance on LLMs or real API executions for evaluation, which introduces significant overhead. To address these challenges, we introduce ACEBench, a comprehensive benchmark for assessing tool usage in LLMs. ACEBench categorizes data into three primary types based on evaluation methodology: Normal, Special, and Agent. "Normal" evaluates tool usage in basic scenarios; "Special" evaluates tool usage in situations with ambiguous or incomplete instructions; "Agent" evaluates tool usage through multi-agent interactions to simulate real-world, multi-turn dialogues. We conducted extensive experiments using ACEBench, analyzing various LLMs in-depth and providing a more granular examination of error causes across different data types.

large language model, match point, natural language, (3 more...)

arXiv.org Artificial Intelligence

2501.12851

Genre: Research Report (0.69)

Industry: Leisure & Entertainment > Sports > Tennis (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

Lu, Yaxi, Yang, Shenzhi, Qian, Cheng, Chen, Guirong, Luo, Qinyu, Wu, Yesai, Wang, Huadong, Cong, Xin, Zhang, Zhong, Lin, Yankai, Liu, Weiwen, Wang, Yasheng, Liu, Zhiyuan, Liu, Fangming, Sun, Maosong

arXiv.org Artificial IntelligenceDec-2-2024

Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. We propose a novel data-driven approach for this problem. Firstly, we collect real-world human activities to generate proactive task predictions. These predictions are then labeled by human annotators as either accepted or rejected. The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents. Building on this, we develop a comprehensive data generation pipeline to create a diverse dataset, ProactiveBench, containing 6,790 events. Finally, we demonstrate that fine-tuning models with the proposed ProactiveBench can significantly elicit the proactiveness of LLM agents. Experimental results show that our fine-tuned model achieves an F1-Score of 66.47% in proactively offering assistance, outperforming all open-source and close-source models. These results highlight the potential of our method in creating more proactive and effective agent systems, paving the way for future advancements in human-agent collaboration.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.12361

Country:

North America > United States > Hawaii (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

GUI Agents with Foundation Models: A Comprehensive Survey

Wang, Shuai, Liu, Weiwen, Chen, Jingxuan, Gan, Weinan, Zeng, Xingshan, Yu, Shuai, Hao, Xinlong, Shao, Kun, Wang, Yasheng, Tang, Ruiming

arXiv.org Artificial IntelligenceNov-7-2024

Recent advances in foundation models, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), facilitate intelligent agents being capable of performing complex tasks. By leveraging the ability of (M)LLMs to process and interpret Graphical User Interfaces (GUIs), these agents can autonomously execute user instructions by simulating human-like interactions such as clicking and typing. This survey consolidates recent research on (M)LLM-based GUI agents, highlighting key innovations in data, frameworks, and applications. We begin by discussing representative datasets and benchmarks. Next, we summarize a unified framework that captures the essential components used in prior research, accompanied by a taxonomy. Additionally, we explore commercial applications of (M)LLM-based GUI agents. Drawing from existing work, we identify several key challenges and propose future research directions. We hope this paper will inspire further developments in the field of (M)LLM-based GUI agents.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.0489

Country: Asia (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis

Wang, Zezhong, Zeng, Xingshan, Liu, Weiwen, Li, Liangyou, Wang, Yasheng, Shang, Lifeng, Jiang, Xin, Liu, Qun, Wong, Kam-Fai

arXiv.org Artificial IntelligenceOct-24-2024

Supervised fine-tuning (SFT) is a common method to enhance the tool calling capabilities of Large Language Models (LLMs), with the training data often being synthesized. The current data synthesis process generally involves sampling a set of tools, formulating a requirement based on these tools, and generating the call statements. However, tools sampled randomly lack relevance, making them difficult to combine and thus reducing the diversity of the data. Additionally, current work overlooks the coherence between turns of dialogues, leading to a gap between the synthesized data and real-world scenarios. To address these issues, we propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues. We integrate these two strategies and enable multiple agents to synthesize the dialogue data interactively, resulting in our tool-calling data synthesis pipeline ToolFlow. Data quality assessments demonstrate improvements in the naturalness and coherence of our synthesized dialogues. Finally, we apply SFT on LLaMA-3.1-8B using 8,000 synthetic dialogues generated with ToolFlow. Results show that the model achieves tool-calling performance comparable to or even surpassing GPT-4, while maintaining strong general capabilities.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.18447

Country:

Asia (0.94)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Chen, Jingxuan, Yuen, Derek, Xie, Bin, Yang, Yuhao, Chen, Gongwei, Wu, Zhihao, Yixing, Li, Zhou, Xurui, Liu, Weiwen, Wang, Shuai, Zhou, Kaiwen, Shao, Rui, Nie, Liqiang, Wang, Yasheng, Hao, Jianye, Wang, Jun, Shao, Kun

arXiv.org Artificial IntelligenceOct-19-2024

Smartphone agents are increasingly important for helping users control devices efficiently, with (Multimodal) Large Language Model (MLLM)-based approaches emerging as key contenders. Fairly comparing these agents is essential but challenging, requiring a varied task scope, the integration of agents with different implementations, and a generalisable evaluation pipeline to assess their strengths and weaknesses. In this paper, we present SPA-Bench, a comprehensive SmartPhone Agent Benchmark designed to evaluate (M)LLM-based agents in an interactive environment that simulates real-world conditions. SPA-Bench offers three key contributions: (1) A diverse set of tasks covering system and third-party apps in both English and Chinese, focusing on features commonly used in daily routines; (2) A plug-and-play framework enabling real-time agent interaction with Android devices, integrating over ten agents with the flexibility to add more; (3) A novel evaluation pipeline that automatically assesses agent performance across multiple dimensions, encompassing seven metrics related to task completion and resource consumption. Our extensive experiments across tasks and agents reveal challenges like interpreting mobile user interfaces, action grounding, memory retention, and execution costs. We propose future research directions to ease these difficulties, moving closer to real-world smartphone agent applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.15164

Country: Asia > China (0.67)

Genre: Research Report > New Finding (0.92)

Industry:

Media (1.00)
Information Technology > Services (1.00)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

ToolACE: Winning the Points of LLM Function Calling

Liu, Weiwen, Huang, Xu, Zeng, Xingshan, Hao, Xinlong, Yu, Shuai, Li, Dexun, Wang, Shuai, Gan, Weinan, Liu, Zhengying, Yu, Yuanqing, Wang, Zezhong, Wang, Yuxian, Ning, Wu, Hou, Yutai, Wang, Bin, Wu, Chuhan, Wang, Xinzhi, Liu, Yong, Wang, Yasheng, Tang, Duyu, Tu, Dandan, Shang, Lifeng, Jiang, Xin, Tang, Ruiming, Lian, Defu, Liu, Qun, Chen, Enhong

arXiv.org Artificial IntelligenceSep-1-2024

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.0092

Country:

North America > United States > Minnesota (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports (0.46)
Media > Music (0.46)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

WESE: Weak Exploration to Strong Exploitation for LLM Agents

Huang, Xu, Liu, Weiwen, Chen, Xiaolong, Wang, Xingmei, Lian, Defu, Wang, Yasheng, Tang, Ruiming, Chen, Enhong

arXiv.org Artificial IntelligenceApr-10-2024

Recently, large language models (LLMs) have demonstrated remarkable potential as an intelligent agent. However, existing researches mainly focus on enhancing the agent's reasoning or decision-making abilities through well-designed prompt engineering or task-specific fine-tuning, ignoring the procedure of exploration and exploitation. When addressing complex tasks within open-world interactive environments, these methods exhibit limitations. Firstly, the lack of global information of environments leads to greedy decisions, resulting in sub-optimal solutions. On the other hand, irrelevant information acquired from the environment not only adversely introduces noise, but also incurs additional cost. This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE), to enhance LLM agents in solving open-world interactive tasks. Concretely, WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task. Our approach is flexible enough to incorporate diverse tasks, and obtains significant improvements in both success rates and efficiency across four interactive benchmarks.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2404.07456

Country:

North America > United States > Illinois (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Understanding the planning of LLM agents: A survey

Huang, Xu, Liu, Weiwen, Chen, Xiaolong, Wang, Xingmei, Wang, Hao, Lian, Defu, Wang, Yasheng, Tang, Ruiming, Chen, Enhong

arXiv.org Artificial IntelligenceFeb-4-2024

As Large Language Models (LLMs) have shown significant intelligence, the progress to leverage LLMs as planning modules of autonomous agents has attracted more attention. This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. We provide a taxonomy of existing works on LLM-Agent planning, which can be categorized into Task Decomposition, Plan Selection, External Module, Reflection and Memory. Comprehensive analyses are conducted for each direction, and further challenges for the field of research are discussed.

large language model, machine learning, preprint arxiv, (20 more...)

arXiv.org Artificial Intelligence

2402.02716

Country: Asia > China (0.28)

Genre:

Research Report (0.50)
Overview (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback