AITopics | Shen, Weizhou

Collaborating Authors

Shen, Weizhou

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Wang, Junyang, Xu, Haiyang, Jia, Haitao, Zhang, Xi, Yan, Ming, Shen, Weizhou, Zhang, Ji, Huang, Fei, Sang, Jitao

arXiv.org Artificial IntelligenceJun-3-2024

Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks, task progress navigation and focus content navigation, are significantly complicated under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance. To address these navigation challenges effectively, we propose Mobile-Agent-v2, a multi-agent architecture for mobile device operation assistance. The architecture comprises three agents: planning agent, decision agent, and reflection agent. The planning agent generates task progress, making the navigation of history operations more efficient. To retain focus content, we design a memory unit that updates with task progress. Additionally, to correct erroneous operations, the reflection agent observes the outcomes of each operation and handles any mistakes accordingly. Experimental results indicate that Mobile-Agent-v2 achieves over a 30% improvement in task completion compared to the single-agent architecture of Mobile-Agent. The code is open-sourced at https://github.com/X-PLUG/MobileAgent.

artificial intelligence, machine learning, opération, (18 more...)

arXiv.org Artificial Intelligence

2406.01014

Genre: Research Report (0.64)

Industry:

Information Technology (0.46)
Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RoleInteract: Evaluating the Social Interaction of Role-Playing Agents

Chen, Hongzhan, Chen, Hehong, Yan, Ming, Xu, Wenshen, Gao, Xing, Shen, Weizhou, Quan, Xiaojun, Li, Chenliang, Zhang, Ji, Huang, Fei, Zhou, Jingren

arXiv.org Artificial IntelligenceMar-21-2024

Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge, and stylistic attributes of these agents, there has been a noticeable gap in assessing their social intelligence. In this paper, we introduce RoleInteract, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions. The benchmark is constructed from a variety of sources and covers a wide range of 500 characters and over 6,000 question prompts and 30,800 multi-turn role-playing utterances. We conduct comprehensive evaluations on this benchmark using mainstream open-source and closed-source LLMs. We find that agents excelling in individual level does not imply their proficiency in group level. Moreover, the behavior of individuals may drift as a result of the influence exerted by other agents within the group. Experimental results on RoleInteract confirm its significance as a testbed for assessing the social interaction of role-playing conversational agents. The benchmark is publicly accessible at https://github.com/X-PLUG/RoleInteract.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2403.13679

Country:

Europe > France (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Shen, Weizhou, Li, Chenliang, Chen, Hongzhan, Yan, Ming, Quan, Xiaojun, Chen, Hehong, Zhang, Ji, Huang, Fei

arXiv.org Artificial IntelligenceJan-31-2024

Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete complex tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers but also excel in task planning, memory management, tool invocation, and result summarization. While traditional approaches focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. Moreover, the entire LLM may require retraining when tools are updated. To overcome these challenges, we propose a novel strategy that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with other components to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.07324

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

Wang, Junyang, Xu, Haiyang, Ye, Jiabo, Yan, Ming, Shen, Weizhou, Zhang, Ji, Huang, Fei, Sang, Jitao

arXiv.org Artificial IntelligenceJan-29-2024

Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile device agent. Mobile-Agent first leverages visual perception tools to accurately identify and locate both the visual and textual elements within the app's front-end interface. Based on the perceived vision context, it then autonomously plans and decomposes the complex operation task, and navigates the mobile Apps through operations step by step. Different from previous solutions that rely on XML files of Apps or mobile system metadata, Mobile-Agent allows for greater adaptability across diverse mobile operating environments in a vision-centric way, thereby eliminating the necessity for system-specific customizations. To assess the performance of Mobile-Agent, we introduced Mobile-Eval, a benchmark for evaluating mobile device operations. Based on Mobile-Eval, we conducted a comprehensive evaluation of Mobile-Agent. The experimental results indicate that Mobile-Agent achieved remarkable accuracy and completion rates. Even with challenging instructions, such as multi-app operations, Mobile-Agent can still complete the requirements. Code and model will be open-sourced at https://github.com/X-PLUG/MobileAgent.

artificial intelligence, mobile-agent, opération, (11 more...)

arXiv.org Artificial Intelligence

2401.16158

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

Shen, Weizhou, Gao, Yingqi, Huang, Canbin, Wan, Fanqi, Quan, Xiaojun, Bi, Wei

arXiv.org Artificial IntelligenceOct-20-2023

Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality of generated responses. In this paper, we propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision. In addition, our approach goes beyond considering solely retrieved entities and incorporates various meta knowledge to guide the generator, thus improving the utilization of knowledge. We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models. The results demonstrate that when combined with meta knowledge, the response generator can effectively leverage high-quality knowledge records from the retriever and enhance the quality of generated responses. The codes and models of this paper are available at https://github.com/shenwzh3/MK-TOD.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2310.08877

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

Li, Chenliang, Chen, Hehong, Yan, Ming, Shen, Weizhou, Xu, Haiyang, Wu, Zhikai, Zhang, Zhicheng, Zhou, Wenmeng, Chen, Yingda, Cheng, Chen, Shi, Hongzhu, Zhang, Ji, Huang, Fei, Zhou, Jingren

arXiv.org Artificial IntelligenceSep-2-2023

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. The ModelScope-Agent library\footnote{https://github.com/modelscope/modelscope-agent} and online demo\footnote{https://modelscope.cn/studios/damo/ModelScopeGPT/summary} are now publicly available.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.00986

Country: Asia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Joint Generator-Ranker Learning for Natural Language Generation

Shen, Weizhou, Gong, Yeyun, Shen, Yelong, Wang, Song, Quan, Xiaojun, Duan, Nan, Chen, Weizhu

arXiv.org Artificial IntelligenceMay-28-2023

Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates. However, existing methods usually train the generator and the ranker individually, neglecting the mutual feedback that could further enhance the generation quality. To tackle this limitation, we propose JGR, a novel joint training algorithm that integrates the generator and the ranker in a single framework. JGR optimizes the generator with a hybrid objective that combines data likelihood and ranker reward, and trains the ranker with a contrastive loss that compares the generator outputs. By iteratively updating the generator and the ranker, JGR can effectively harmonize their learning and enhance their quality jointly. We evaluate JGR on various text generation tasks and demonstrate that it surpasses existing methods on four public datasets across three common generation scenarios. Our code and models are publicly available at https://github.com/microsoft/ProphetNet/tree/master/JGR.

artificial intelligence, generator, natural language, (17 more...)

arXiv.org Artificial Intelligence

2206.13974

Country: North America > Canada (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (0.87)

Add feedback

Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

Wan, Fanqi, Shen, Weizhou, Yang, Ke, Quan, Xiaojun, Bi, Wei

arXiv.org Artificial IntelligenceMay-17-2023

Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address this, we propose to decouple knowledge retrieval from response generation and introduce a multi-grained knowledge retriever (MAKER) that includes an entity selector to search for relevant entities and an attribute selector to filter out irrelevant attributes. To train the retriever, we propose a novel distillation objective that derives supervision signals from the response generator. Experiments conducted on three standard benchmarks with both small and large-scale knowledge bases demonstrate that our retriever performs knowledge retrieval more effectively than existing methods. Our code has been made publicly available.\footnote{https://github.com/18907305772/MAKER}

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.10149

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.61)

Add feedback

Generic Dependency Modeling for Multi-Party Conversation

Shen, Weizhou, Quan, Xiaojun, Yang, Ke

arXiv.org Artificial IntelligenceFeb-21-2023

To model the dependencies between utterances in multi-party conversations, we propose a simple and generic framework based on the dependency parsing results of utterances. Particularly, we present an approach to encoding the dependencies in the form of relative dependency encoding (ReDE) and illustrate how to implement it in Transformers by modifying the computation of self-attention. Experimental results on four multi-party conversation benchmarks show that this framework successfully boosts the general performance of two Transformer-based language models and leads to comparable or even superior performance compared to the state-of-the-art methods. The codes are available at https://github.com/shenwzh3/ReDE.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.1068

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

DialogXL: All-in-One XLNet for Multi-Party Conversation Emotion Recognition

Shen, Weizhou, Chen, Junqing, Quan, Xiaojun, Xie, Zhixian

arXiv.org Artificial IntelligenceDec-15-2020

This paper presents our pioneering effort for emotion recognition in conversation (ERC) with pre-trained language models. Unlike regular documents, conversational utterances appear alternately from different parties and are usually organized as hierarchical structures in previous work. Such structures are not conducive to the application of pre-trained language models such as XLNet. To address this issue, we propose an all-in-one XLNet model, namely DialogXL, with enhanced memory to store longer historical context and dialog-aware self-attention to deal with the multi-party structures. Specifically, we first modify the recurrence mechanism of XLNet from segment-level to utterance-level in order to better model the conversational data. Second, we introduce dialog-aware self-attention in replacement of the vanilla self-attention in XLNet to capture useful intra- and inter-speaker dependencies. Extensive experiments are conducted on four ERC benchmarks with mainstream models presented for comparison. The experimental results show that the proposed model outperforms the baselines on all the datasets. Several other experiments such as ablation study and error analysis are also conducted and the results confirm the role of the critical modules of DialogXL.

artificial intelligence, neural network, utterance, (19 more...)

arXiv.org Artificial Intelligence

2012.08695

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback