AITopics | Wen, Ying

Collaborating Authors

Wen, Ying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Wan, Ziyu, Li, Yunxiang, Song, Yan, Wang, Hanjing, Yang, Linyi, Schmidt, Mark, Wang, Jun, Zhang, Weinan, Hu, Shuyue, Wen, Ying

arXiv.org Artificial IntelligenceMar-14-2025

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking -- enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking. ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions. Through iterative reinforcement learning with aligned objectives, these agents explore and learn collaboration, leading to improved generalization and robustness. Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks, including competitive-level mathematical benchmarks and LLM-as-a-Judge benchmarks. Comprehensive ablation studies further illustrate the evolving dynamics of each distinct agent, providing valuable insights into how the meta-thinking reasoning process enhances the reasoning capabilities of LLMs.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.09501

Country:

North America > United States (0.14)
Europe > France (0.14)
Asia > Middle East (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand

Bai, Fengshuo, Li, Yu, Chu, Jie, Chou, Tawei, Zhu, Runchuan, Wen, Ying, Yang, Yaodong, Chen, Yuanpei

arXiv.org Artificial IntelligenceFeb-26-2025

Retrieving objects buried beneath multiple objects is not only challenging but also time-consuming. Performing manipulation in such environments presents significant difficulty due to complex contact relationships. Existing methods typically address this task by sequentially grasping and removing each occluding object, resulting in lengthy execution times and requiring impractical grasping capabilities for every occluding object. In this paper, we present a dexterous arm-hand system for efficient object retrieval in multi-object stacked environments. Our approach leverages large-scale parallel reinforcement learning within diverse and carefully designed cluttered environments to train policies. These policies demonstrate emergent manipulation skills (e.g., pushing, stirring, and poking) that efficiently clear occluding objects to expose sufficient surface area of the target object. We conduct extensive evaluations across a set of over 10 household objects in diverse clutter configurations, demonstrating superior retrieval performance and efficiency for both trained and unseen objects. Furthermore, we successfully transfer the learned policies to a real-world dexterous multi-fingered robot system, validating their practical applicability in real-world scenarios. Videos can be found on our project website https://ChangWinde.github.io/RetrDex.

efficient object retrieval, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.18423

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning

Hu, Kun, Wen, Muning, Wang, Xihuai, Zhang, Shao, Shi, Yiwei, Li, Minne, Li, Minglong, Wen, Ying

arXiv.org Artificial IntelligenceFeb-23-2025

Multi-agent reinforcement learning (MARL) faces challenges in coordinating agents due to complex interdependencies within multi-agent systems. Most MARL algorithms use the simultaneous decision-making paradigm but ignore the action-level dependencies among agents, which reduces coordination efficiency. In contrast, the sequential decision-making paradigm provides finer-grained supervision for agent decision order, presenting the potential for handling dependencies via better decision order management. However, determining the optimal decision order remains a challenge. In this paper, we introduce Action Generation with Plackett-Luce Sampling (AGPS), a novel mechanism for agent decision order optimization. We model the order determination task as a Plackett-Luce sampling process to address issues such as ranking instability and vanishing gradient during the network training process. AGPS realizes credit-based decision order determination by establishing a bridge between the significance of agents' local observations and their decision credits, thus facilitating order optimization and dependency management. Integrating AGPS with the Multi-Agent Transformer, we propose the Prioritized Multi-Agent Transformer (PMAT), a sequential decision-making MARL algorithm with decision order optimization. Experiments on benchmarks including StarCraft II Multi-Agent Challenge, Google Research Football, and Multi-Agent MuJoCo show that PMAT outperforms state-of-the-art algorithms, greatly enhancing coordination efficiency.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.16496

Country:

Asia > China (0.48)
North America > United States > Michigan (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports > Football (0.46)
Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)

Add feedback

ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning

Huang, Shulin, Yang, Linyi, Song, Yan, Chen, Shuang, Cui, Leyang, Wan, Ziyu, Zeng, Qingcheng, Wen, Ying, Shao, Kun, Zhang, Weinan, Wang, Jun, Zhang, Yue

arXiv.org Artificial IntelligenceFeb-22-2025

Evaluating large language models (LLMs) poses significant challenges, particularly due to issues of data contamination and the leakage of correct answers. To address these challenges, we introduce ThinkBench, a novel evaluation framework designed to evaluate LLMs' reasoning capability robustly. ThinkBench proposes a dynamic data generation method for constructing out-of-distribution (OOD) datasets and offers an OOD dataset that contains 2,912 samples drawn from reasoning tasks. ThinkBench unifies the evaluation of reasoning models and non-reasoning models. We evaluate 16 LLMs and 4 PRMs under identical experimental conditions and show that most of the LLMs' performance are far from robust and they face a certain level of data leakage. By dynamically generating OOD datasets, ThinkBench effectively provides a reliable evaluation of LLMs and reduces the impact of data contamination.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.16268

Country: Asia (0.14)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Materials > Chemicals (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning

Zhu, Jiachen, Zheng, Congmin, Lin, Jianghao, Du, Kounianhua, Wen, Ying, Yu, Yong, Wang, Jun, Zhang, Weinan

arXiv.org Artificial IntelligenceFeb-20-2025

While large language models (LLMs) have significantly advanced mathematical reasoning, Process Reward Models (PRMs) have been developed to evaluate the logical validity of reasoning steps. However, PRMs still struggle with out-of-distribution (OOD) challenges. This paper identifies key OOD issues, including step OOD, caused by differences in reasoning patterns across model types and sizes, and question OOD, which arises from dataset shifts between training data and real-world problems. To address these issues, we introduce Retrieval-Augmented Process Reward Model (RetrievalPRM), a novel framework designed to tackle these OOD issues. By utilizing a two-stage retrieval-enhanced mechanism, RetrievalPRM retrieves semantically similar questions and steps as a warmup, enhancing PRM's ability to evaluate target steps and improving generalization and reasoning consistency across different models and problem types. Our extensive experiments demonstrate that RetrievalPRM outperforms existing baselines across multiple real-world datasets. Our open-source contributions include a retrieval-enhanced dataset, a tuning framework for PRM training, and the RetrievalPRM model, establishing a new standard for PRM performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.14361

Country: Asia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations

Chen, Jingxiao, Li, Xinyao, Cao, Jiahang, Zhu, Zhengbang, Dong, Wentao, Liu, Minghuan, Wen, Ying, Yu, Yong, Zhang, Liqing, Zhang, Weinan

arXiv.org Artificial IntelligenceFeb-18-2025

Figure 1: RHINO has the capabilities of real-time interaction on diverse tasks. Abstract--Humanoid robots have shown success in locomotion its effectiveness, flexibility, and safety in various scenarios. We summarize related works in each tasks in real-time. Some others focus on recognizing human category and highlight the differences between our work. The robot cannot be interrupted once a task Humanoid robots need to estimate the human physical and is in progress, and further human commands can only be mental states to provide appropriate assistance [35]. Object information in the complexity of human interactions [33, 37], but they often the environment also plays an important role in predicting suffer from high latency and are not suitable for real-time human intention by combining it with human motion. These limitations hinder robots from rapid interaction, such as pointing gestures [14] and grabbing interventions and robust, multi-step interactions in humancentered objects [24], provides a broader semantic space for human tasks. Most works on human intention recognition treat human-robot interaction with real-time intention recognition the interaction as a two-stage process, where the robot first and various skills is urgently needed to tackle the above predicts the human intention and then executes the task. Our work aims to react learning framework for Reactive Humanoid-human to human signals in real time, enabling the downstream tasks INteraction and Object Manipulation. RHINO decouples the to be interrupted at any time. In human-robot interaction and object manipulation skills based on predicted intentions. To ensure unique opportunity to learn natural motion from retargeted the scalability of RHINO across a wide range of skills, we human motion data [15]. Human motion data can be collected design a pipeline for learning the interactions from humanobject-human from motion capture systems or network videos.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.13134

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.97)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration

Zhang, Shao, Wang, Xihuai, Zhang, Wenhao, Li, Chaoran, Song, Junru, Li, Tingyu, Qiu, Lin, Cao, Xuezhi, Cai, Xunliang, Yao, Wen, Zhang, Weinan, Wang, Xinbing, Wen, Ying

arXiv.org Artificial IntelligenceFeb-17-2025

Agents built on large language models (LLMs) have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent's System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent's System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously. Code of DPT-Agent can be found in https://github.com/sjtu-marl/DPT-Agent.

large language model, machine learning, real time system, (20 more...)

arXiv.org Artificial Intelligence

2502.11882

Country: Asia (0.28)

Genre: Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Add feedback

Adaptive Teaming in Multi-Drone Pursuit: Simulation, Training, and Deployment

Li, Yang, Chen, Junfan, Xue, Feng, Qiu, Jiabin, Li, Wenbin, Zhang, Qingrui, Wen, Ying, Pan, Wei

arXiv.org Artificial IntelligenceFeb-13-2025

Adaptive teaming, the ability to collaborate with unseen teammates without prior coordination, remains an underexplored challenge in multi-robot collaboration. This paper focuses on adaptive teaming in multi-drone cooperative pursuit, a critical task with real-world applications such as border surveillance, search-and-rescue, and counter-terrorism. We first define and formalize the \textbf{A}daptive Teaming in \textbf{M}ulti-\textbf{D}rone \textbf{P}ursuit (AT-MDP) problem and introduce AT-MDP framework, a comprehensive framework that integrates simulation, algorithm training and real-world deployment. AT-MDP framework provides a flexible experiment configurator and interface for simulation, a distributed training framework with an extensive algorithm zoo (including two newly proposed baseline methods) and an unseen drone zoo for evaluating adaptive teaming, as well as a real-world deployment system that utilizes edge computing and Crazyflie drones. To the best of our knowledge, AT-MDP framework is the first adaptive framework for continuous-action decision-making in complex real-world drone tasks, enabling multiple drones to coordinate effectively with unseen teammates. Extensive experiments in four multi-drone pursuit environments of increasing difficulty confirm the effectiveness of AT-MDP framework, while real-world deployments further validate its feasibility in physical systems. Videos and code are available at https://sites.google.com/view/at-mdp.

artificial intelligence, machine learning, teammate, (15 more...)

arXiv.org Artificial Intelligence

2502.09762

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.68)

Add feedback

Language Games as the Pathway to Artificial Superhuman Intelligence

Wen, Ying, Wan, Ziyu, Zhang, Shao

arXiv.org Artificial IntelligenceJan-31-2025

The evolution of large language models (LLMs) toward artificial superhuman intelligence (ASI) hinges on data reproduction, a cyclical process in which models generate, curate and retrain on novel data to refine capabilities. Current methods, however, risk getting stuck in a data reproduction trap: optimizing outputs within fixed human-generated distributions in a closed loop leads to stagnation, as models merely recombine existing knowledge rather than explore new frontiers. In this paper, we propose language games as a pathway to expanded data reproduction, breaking this cycle through three mechanisms: (1) \textit{role fluidity}, which enhances data diversity and coverage by enabling multi-agent systems to dynamically shift roles across tasks; (2) \textit{reward variety}, embedding multiple feedback criteria that can drive complex intelligent behaviors; and (3) \textit{rule plasticity}, iteratively evolving interaction constraints to foster learnability, thereby injecting continual novelty. By scaling language games into global sociotechnical ecosystems, human-AI co-evolution generates unbounded data streams that drive open-ended exploration. This framework redefines data reproduction not as a closed loop but as an engine for superhuman intelligence.

data reproduction, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2501.18924

Country:

North America > United States > New York (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.65)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

LLM-based Multi-Agent Systems: Techniques and Business Perspectives

Yang, Yingxuan, Peng, Qiuying, Wang, Jun, Wen, Ying, Zhang, Weinan

arXiv.org Artificial IntelligenceDec-28-2024

In the era of (multi-modal) large language models, most operational processes can be reformulated and reproduced using LLM agents. The LLM agents can perceive, control, and get feedback from the environment so as to accomplish the given tasks in an autonomous manner. Besides the environment-interaction property, the LLM agents can call various external tools to ease the task completion process. The tools can be regarded as a predefined operational process with private or real-time knowledge that does not exist in the parameters of LLMs. As a natural trend of development, the tools for calling are becoming autonomous agents, thus the full intelligent system turns out to be a LLM-based Multi-Agent System (LaMAS). Compared to the previous single-LLM-agent system, LaMAS has the advantages of i) dynamic task decomposition and organic specialization, ii) higher flexibility for system changing, iii) proprietary data preserving for each participating entity, and iv) feasibility of monetization for each entity. This paper discusses the technical and business landscapes of LaMAS. To support the ecosystem of LaMAS, we provide a preliminary version of such LaMAS protocol considering technical requirements, data privacy, and business incentives. As such, LaMAS would be a practical solution to achieve artificial collective intelligence in the near future.

artificial intelligence, large language model, natural language, (11 more...)

arXiv.org Artificial Intelligence

2411.14033

Country: Asia > China (0.47)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback