AITopics | Xu, Yinghui

Collaborating Authors

Xu, Yinghui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents

Xu, Rui, Wang, MingYu, Wang, XinTao, Lu, Dakuan, Tan, Xiaoyu, Chu, Wei, Xu, Yinghui

arXiv.org Artificial IntelligenceMar-11-2025

Recent advances in LLM-based role-playing language agents (RPLAs) have attracted broad attention in various applications. While chain-of-thought reasoning has shown importance in many tasks for LLMs, the internal thinking processes of RPLAs remain unexplored. Understanding characters' inner thoughts is crucial for developing advanced RPLAs. In this paper, we introduce ROLETHINK, a novel benchmark constructed from literature for evaluating character thought generation. We propose the task of inner thought reasoning, which includes two sets: the gold set that compares generated thoughts with original character monologues, and the silver set that uses expert synthesized character analyses as references. To address this challenge, we propose MIRROR, a chain-of-thought approach that generates character thoughts by retrieving memories, predicting character reactions, and synthesizing motivations. Through extensive experiments, we demonstrate the importance of inner thought reasoning for RPLAs, and MIRROR consistently outperforms existing methods. Resources are available at https://github.com/airaer1998/RPA_Thought.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.08193

Country: Asia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification

Tan, Xiaoyu, Yao, Tianchu, Qu, Chao, Li, Bin, Yang, Minghao, Lu, Dakuan, Wang, Haozhe, Qiu, Xihe, Chu, Wei, Xu, Yinghui, Qi, Yuan

arXiv.org Artificial IntelligenceFeb-17-2025

The reasoning capabilities of advanced large language models (LLMs) like o1 have revolutionized artificial intelligence applications. Nevertheless, evaluating and optimizing complex reasoning processes remain significant challenges due to diverse policy distributions and the inherent limitations of human effort and accuracy. In this paper, we present AURORA, a novel automated framework for training universal process reward models (PRMs) using ensemble prompting and reverse verification. The framework employs a two-phase approach: First, it uses diverse prompting strategies and ensemble methods to perform automated annotation and evaluation of processes, ensuring robust assessments for reward learning. Second, it leverages practical reference answers for reverse verification, enhancing the model's ability to validate outputs and improving training accuracy. To assess the framework's performance, we extend beyond the existing ProcessBench benchmark by introducing UniversalBench, which evaluates reward predictions across full trajectories under diverse policy distribtion with long Chain-of-Thought (CoT) outputs. Experimental results demonstrate that AURORA enhances process evaluation accuracy, improves PRMs' accuracy for diverse policy distributions and long-CoT responses. The project will be open-sourced at https://auroraprm.github.io/. The Universal-PRM-7B is available at https://huggingface.co/infly/Universal-PRM-7B.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.1152

Genre: Research Report > New Finding (0.87)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

SCP-116K: A High-Quality Problem-Solution Dataset and a Generalized Pipeline for Automated Extraction in the Higher Education Science Domain

Lu, Dakuan, Tan, Xiaoyu, Xu, Rui, Yao, Tianchu, Qu, Chao, Chu, Wei, Xu, Yinghui, Qi, Yuan

arXiv.org Artificial IntelligenceJan-26-2025

Recent breakthroughs in large language models (LLMs) exemplified by the impressive mathematical and scientific reasoning capabilities of the o1 model have spotlighted the critical importance of high-quality training data in advancing LLM performance across STEM disciplines. While the mathematics community has benefited from a growing body of curated datasets, the scientific domain at the higher education level has long suffered from a scarcity of comparable resources. To address this gap, we present SCP-116K, a new large-scale dataset of 116,756 high-quality problem-solution pairs, automatically extracted from heterogeneous sources using a streamlined and highly generalizable pipeline. Our approach involves stringent filtering to ensure the scientific rigor and educational level of the extracted materials, while maintaining adaptability for future expansions or domain transfers. By openly releasing both the dataset and the extraction pipeline, we seek to foster research on scientific reasoning, enable comprehensive performance evaluations of new LLMs, and lower the barrier to replicating the successes of advanced models like o1 in the broader science community. We believe SCP-116K will serve as a critical resource, catalyzing progress in high-level scientific reasoning tasks and promoting further innovations in LLM development. The dataset and code are publicly available at https://github.com/AQA6666/SCP-116K-open.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.15587

Country: Asia > Thailand (0.14)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Higher Education (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis

Wei, Yingchen, Qiu, Xihe, Tan, Xiaoyu, Huang, Jingjing, Chu, Wei, Xu, Yinghui, Qi, Yuan

arXiv.org Artificial IntelligenceDec-25-2024

Obstructive sleep apnea-hypopnea syndrome (OSAHS) [1] Our key contributions are as follows: (1) Introducing VTA-affects about 27% of adults [2], causing poor sleep, daytime OSAHS, a multimodal framework for diagnosing OSAHS dysfunction, and higher risks of cardiovascular diseases and diabetes severity by combining visual and language data, and using [3]. The standard diagnostic method, polysomnography a pre-trained language model to extract key information from (PSG) [4], is complex, costly, and uncomfortable, requiring basic physiological data for improved classification accuracy; multi-channel monitoring (EEG, ECG, heart rate [5]) and (2) Developing a visual encoder that focuses on specific facial trained technicians (Figure 1). Data-driven methods for automated features associated with OSAHS, employing attention mesh OSAHS diagnosis can improve efficiency and reduce and stochastic gates for better clinical decision alignment; (3) costs. Facial features like a flat nasal bridge, wide jawbone, Implementing a data pre-processing strategy to handle imbalanced thick neck, and mandibular retrognathia correlate with OSAHS samples and ordinal classification, using randomOver-severity [6], providing visual indicators of airway obstruction Sampler (ROS) [17] and an ordinal regression loss function and sleep disturbances. Deep learning can analyze these features [18] to enhance accuracy and robustness; (4) Demonstrating for early diagnosis and personalized treatment.

accuracy, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.18919

Country: Asia > China (0.17)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Huang, Siming, Cheng, Tianhao, Liu, J. K., Hao, Jiaran, Song, Liuyihan, Xu, Yang, Yang, J., Liu, J. H., Zhang, Chenchen, Chai, Linzheng, Yuan, Ruifeng, Zhang, Zhaoxiang, Fu, Jie, Liu, Qian, Zhang, Ge, Wang, Zili, Qi, Yuan, Xu, Yinghui, Chu, Wei

arXiv.org Artificial IntelligenceNov-9-2024

Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs suitable for rigorous scientific investigation, particularly those with reproducible data processing pipelines and transparent training protocols, remain limited. The scarcity is due to various challenges, including resource constraints, ethical considerations, and the competitive advantages of keeping models advanced. To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community. Unlike most prior efforts, we release not only model weights and inference code, but also the reproducible training data, complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research. Through this comprehensive release, we identify the key ingredients for building a top-tier code LLM: (1) code optimized heuristic rules for data cleaning and methods for data deduplication, (2) recall of text corpus related to code and (3) high-quality synthetic data in both annealing and supervised fine-tuning stages. By offering this level of openness, we aim to broaden access to all aspects of a top-tier code LLM, with OpenCoder serving as both a powerful model and an open foundation to accelerate research, and enable reproducible advancements in code AI.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.04905

Country:

Europe > Austria (0.28)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Software (0.86)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Struct-X: Enhancing Large Language Models Reasoning with Structured Data

Tan, Xiaoyu, Wang, Haoyu, Qiu, Xihe, Cheng, Yuan, Xu, Yinghui, Chu, Wei, Qi, Yuan

arXiv.org Artificial IntelligenceJul-17-2024

Structured data, rich in logical and relational information, has the potential to enhance the reasoning abilities of large language models (LLMs). Still, its integration poses a challenge due to the risk of overwhelming LLMs with excessive tokens and irrelevant context information. To address this, we propose Struct-X, a novel framework that operates through five key phases: ``read-model-fill-reflect-reason'' efficiently enabling LLMs to utilize structured data. It begins by encoding structured data into a topological space using graph embeddings, followed by filling in missing entity information with knowledge retrieval modules, and filtering out irrelevant tokens via a self-supervised module. The final phase involves constructing a topological network with selected tokens to further reduce the total token length for more effective LLM inference. Additionally, Struct-X includes an Auxiliary Module trained to generate prompts, aiding LLMs in analyzing structured data. Extensive experiments on benchmarks, including the knowledge graph question-answer task and the long document reading comprehension task, show that Struct-X notably improves LLM reasoning, demonstrating the effectiveness of structured data augmentation in improving LLM inference with complex input context.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.12522

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry:

Energy (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

Qiu, Xihe, Wang, Haoyu, Tan, Xiaoyu, Qu, Chao, Xiong, Yujie, Cheng, Yuan, Xu, Yinghui, Chu, Wei, Qi, Yuan

arXiv.org Artificial IntelligenceJul-17-2024

Effective collaboration in multi-agent systems requires communicating goals and intentions between agents. Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter-module communication, frequently leading to suboptimal multi-agent reinforcement learning (MARL) policies and inadequate task coordination. To address these challenges, we present a framework for training large language models (LLMs) as collaborative agents to enable coordinated behaviors in cooperative MARL. Each agent maintains a private intention consisting of its current goal and associated sub-tasks. Agents broadcast their intentions periodically, allowing other agents to infer coordination tasks. A propagation network transforms broadcast intentions into teammate-specific communication messages, sharing relevant goals with designated teammates. The architecture of our framework is structured into planning, grounding, and execution modules. During execution, multiple agents interact in a downstream environment and communicate intentions to enable coordinated behaviors. The grounding module dynamically adapts comprehension strategies based on emerging coordination patterns, while feedback from execution agents influnces the planning module, enabling the dynamic re-planning of sub-tasks. Results in collaborative environment simulation demonstrate intention propagation reduces miscoordination errors by aligning sub-task dependencies between agents. Agents learn when to communicate intentions and which teammates require task details, resulting in emergent coordinated behaviors. This demonstrates the efficacy of intention sharing for cooperative multi-agent RL based on LLMs.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.12532

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.86)

Add feedback

AI2Apps: A Visual IDE for Building LLM-based AI Agent Applications

Pang, Xin, Li, Zhucong, Chen, Jiaxiang, Cheng, Yuan, Xu, Yinghui, Qi, Yuan

arXiv.org Artificial IntelligenceApr-7-2024

We introduce AI2Apps, a Visual Integrated Development Environment (Visual IDE) with full-cycle capabilities that accelerates developers to build deployable LLM-based AI agent Applications. This Visual IDE prioritizes both the Integrity of its development tools and the Visuality of its components, ensuring a smooth and efficient building experience.On one hand, AI2Apps integrates a comprehensive development toolkit ranging from a prototyping canvas and AI-assisted code editor to agent debugger, management system, and deployment tools all within a web-based graphical user interface. On the other hand, AI2Apps visualizes reusable front-end and back-end code as intuitive drag-and-drop components. Furthermore, a plugin system named AI2Apps Extension (AAE) is designed for Extensibility, showcasing how a new plugin with 20 components enables web agent to mimic human-like browsing behavior. Our case study demonstrates substantial efficiency improvements, with AI2Apps reducing token consumption and API calls when debugging a specific sophisticated multimodal agent by approximately 90% and 80%, respectively. The AI2Apps, including an online demo, open-source code, and a screencast video, is now publicly accessible.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.04902

Country: Asia > China (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

PILLOW: Enhancing Efficient Instruction Fine-tuning via Prompt Matching

Qi, Zhenting, Tan, Xiaoyu, Shi, Shaojie, Qu, Chao, Xu, Yinghui, Qi, Yuan

arXiv.org Artificial IntelligenceDec-9-2023

Instruction fine-tuning has conventionally been employed to adapt Large Language Models (LLMs) to a variety of tasks. Nonetheless, this technique often necessitates substantial computational resources, making it impractical for deployment by individuals or small-scale entities. Recently, Low-Rank Adaptation (LoRA) has become a promising alternative, offering high capabilities on par with full tuning with reduced resource overhead. However, attaining satisfactory performance through the fine-tuning of LoRA is a non-trivial challenge. In this paper, we propose PILLOW, which aims to improve LoRA's performance by a discrimination-based prompting method, leveraging LLMs' In-Context Learning ability. PILLOW incorporates a matching network that selects prompts from a user-defined prompt pool, concatenates the selected prompts with the user instruction as input, and performs inference using the LoRA-fine-tuned LLMs. Trained with Reinforcement Learning, PILLOW exhibits commensurate performance on various evaluation metrics compared with typical instruction fine-tuning methods, utilizing only consumer-grade GPU resources and exhibiting a large reduction in computational costs.

large language model, machine learning, preprint arxiv, (16 more...)

arXiv.org Artificial Intelligence

2312.05621

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

FuXi: A cascade machine learning forecasting system for 15-day global weather forecast

Chen, Lei, Zhong, Xiaohui, Zhang, Feng, Cheng, Yuan, Xu, Yinghui, Qi, Yuan, Li, Hao

arXiv.org Artificial IntelligenceOct-20-2023

Over the past few years, due to the rapid development of machine learning (ML) models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0.25 degree. However, the challenge remains to perform comparably to the ECMWF ensemble mean (EM) in 15-day forecasts. Previous studies have demonstrated the importance of mitigating the accumulation of forecast errors for effective long-term forecasts. Despite numerous efforts to reduce accumulation errors, including autoregressive multi-time step loss, using a single model is found to be insufficient to achieve optimal performance in both short and long lead times. Therefore, we present FuXi, a cascaded ML weather forecasting system that provides 15-day global forecasts with a temporal resolution of 6 hours and a spatial resolution of 0.25 degree. FuXi is developed using 39 years of the ECMWF ERA5 reanalysis dataset. The performance evaluation, based on latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC), demonstrates that FuXi has comparable forecast performance to ECMWF EM in 15-day forecasts, making FuXi the first ML-based weather forecasting system to accomplish this achievement.

artificial intelligence, machine learning, modeling & simulation, (20 more...)

arXiv.org Artificial Intelligence

2306.12873

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback