AITopics | Yang, Yijun

Collaborating Authors

Yang, Yijun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training

Wei, Tong, Yang, Yijun, Xing, Junliang, Shi, Yuanchun, Lu, Zongqing, Ye, Deheng

arXiv.org Artificial IntelligenceMar-11-2025

Reinforcement learning with verifiable outcome rewards (RLVR) has effectively scaled up chain-of-thought (CoT) reasoning in large language models (LLMs). Yet, its efficacy in training vision-language model (VLM) agents for goal-directed action reasoning in visual environments is less established. This work investigates this problem through extensive experiments on complex card games, such as 24 points, and embodied tasks from ALFWorld. We find that when rewards are based solely on action outcomes, RL fails to incentivize CoT reasoning in VLMs, instead leading to a phenomenon we termed thought collapse, characterized by a rapid loss of diversity in the agent's thoughts, state-irrelevant and incomplete reasoning, and subsequent invalid actions, resulting in negative rewards. To counteract thought collapse, we highlight the necessity of process guidance and propose an automated corrector that evaluates and refines the agent's reasoning at each RL step. This simple and scalable GTR (Guided Thought Reinforcement) framework trains reasoning and action simultaneously without the need for dense, per-step human labeling. Our experiments demonstrate that GTR significantly enhances the performance and generalization of the LLaVA-7b model across various visual environments, achieving 3-5 times higher task success rates compared to SoTA models with notably smaller model sizes.

formula, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.08525

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Multi-Agent Coordination across Diverse Applications: A Survey

Sun, Lijun, Yang, Yijun, Duan, Qiqi, Shi, Yuhui, Lyu, Chao, Chang, Yu-Cheng, Lin, Chin-Teng, Shen, Yang

arXiv.org Artificial IntelligenceFeb-20-2025

Multi-agent coordination studies the underlying mechanism enabling the trending spread of diverse multi-agent systems (MAS) and has received increasing attention, driven by the expansion of emerging applications and rapid AI advances. This survey outlines the current state of coordination research across applications through a unified understanding that answers four fundamental coordination questions: (1) what is coordination; (2) why coordination; (3) who to coordinate with; and (4) how to coordinate. Our purpose is to explore existing ideas and expertise in coordination and their connections across diverse applications, while identifying and highlighting emerging and promising research directions. First, general coordination problems that are essential to varied applications are identified and analyzed. Second, a number of MAS applications are surveyed, ranging from widely studied domains, e.g., search and rescue, warehouse automation and logistics, and transportation systems, to emerging fields including humanoid and anthropomorphic robots, satellite systems, and large language models (LLMs). Finally, open challenges about the scalability, heterogeneity, and learning mechanisms of MAS are analyzed and discussed. In particular, we identify the hybridization of hierarchical and decentralized coordination, human-MAS coordination, and LLM-based MAS as promising future directions.

artificial intelligence, natural language, survey article, (15 more...)

arXiv.org Artificial Intelligence

2502.14743

Country:

Asia (0.93)
North America > United States (0.93)
Europe > United Kingdom > England (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (1.00)
Transportation > Ground > Road (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

Evaluating and Improving Graph to Text Generation with Large Language Models

He, Jie, Yang, Yijun, Long, Wanqiu, Xiong, Deyi, Gutierrez-Basulto, Victor, Pan, Jeff Z.

arXiv.org Artificial IntelligenceFeb-14-2025

Large language models (LLMs) have demonstrated immense potential across various tasks. However, research for exploring and improving the capabilities of LLMs in interpreting graph structures remains limited. To address this gap, we conduct a comprehensive evaluation of prompting current open-source LLMs on graph-to-text generation tasks. Although we explored the optimal prompting strategies and proposed a novel and effective diversity-difficulty-based few-shot sample selection method, we found that the improvements from tuning-free approaches were incremental, as LLMs struggle with planning on complex graphs, particularly those with a larger number of triplets. To further improve LLMs in planning with graph sequences and grounding in truth, we introduce a new graph-to-text dataset, PlanGTG, annotated with two sub-tasks: reordering and attribution. Through extensive automatic and human evaluations, we demonstrate significant improvements in the quality of generated text from both few-shot learning and fine-tuning perspectives using the PlanGTG dataset. Our study paves the way for new research directions in graph-to-text generation. PlanGTG datasets can be found in https://github.com/probe2/kg_text.

large language model, natural language, text generation, (2 more...)

arXiv.org Artificial Intelligence

2501.14497

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails

Yang, Yijun, Wang, Lichao, Yang, Xiao, Hong, Lanqing, Zhu, Jun

arXiv.org Artificial IntelligenceFeb-8-2025

Vision Large Language Models (VLLMs) integrate visual data processing, expanding their real-world applications, but also increasing the risk of generating unsafe responses. In response, leading companies have implemented Multi-Layered safety defenses, including alignment training, safety system prompts, and content moderation. However, their effectiveness against sophisticated adversarial attacks remains largely unexplored. In this paper, we propose MultiFaceted Attack, a novel attack framework designed to systematically bypass Multi-Layered Defenses in VLLMs. It comprises three complementary attack facets: Visual Attack that exploits the multimodal nature of VLLMs to inject toxic system prompts through images; Alignment Breaking Attack that manipulates the model's alignment mechanism to prioritize the generation of contrasting responses; and Adversarial Signature that deceives content moderators by strategically placing misleading information at the end of the response. Extensive evaluations on eight commercial VLLMs in a black-box setting demonstrate that MultiFaceted Attack achieves a 61.56% attack success rate, surpassing state-of-the-art methods by at least 42.18%.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.05772

Country:

North America > Mexico (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models

Cai, Huanqia, Yang, Yijun, Hu, Winston

arXiv.org Artificial IntelligenceFeb-2-2025

IQ testing has served as a foundational methodology for evaluating human cognitive capabilities, deliberately decoupling assessment from linguistic background, language proficiency, or domain-specific knowledge to isolate core competencies in abstraction and reasoning. Yet, artificial intelligence research currently lacks systematic benchmarks to quantify these critical cognitive dimensions in multimodal systems. To address this critical gap, we propose MM-IQ, a comprehensive evaluation framework comprising 2,710 meticulously curated test items spanning 8 distinct reasoning paradigms. Through systematic evaluation of leading open-source and proprietary multimodal models, our benchmark reveals striking limitations: even state-of-the-art architectures achieve only marginally superior performance to random chance (27.49% vs. 25% baseline accuracy). This substantial performance chasm highlights the inadequacy of current multimodal systems in approximating fundamental human reasoning capacities, underscoring the need for paradigm-shifting advancements to bridge this cognitive divide.

large language model, machine learning, paradigm, (20 more...)

arXiv.org Artificial Intelligence

2502.00698

Genre: Research Report (1.00)

Industry: Education > Assessment & Standards > Measuring Intelligence (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Cognitive Science (0.94)

Add feedback

System-2 Mathematical Reasoning via Enriched Instruction Tuning

Cai, Huanqia, Yang, Yijun, Li, Zhifeng

arXiv.org Artificial IntelligenceDec-24-2024

Solving complex mathematical problems via system-2 reasoning is a natural human skill, yet it remains a significant challenge for current large language models (LLMs). We identify the scarcity of deliberate multi-step reasoning data as a primary limiting factor. To this end, we introduce Enriched Instruction Tuning (EIT), a method that enriches existing human-annotated mathematical datasets by synergizing human and AI feedback to create fine-grained reasoning trajectories. These datasets are then used to fine-tune open-source LLMs, enhancing their mathematical reasoning abilities without reliance on any symbolic verification program. Concretely, EIT is composed of two critical steps: Enriching with Reasoning Plan (ERP) and Enriching with Reasoning Step (ERS). The former generates a high-level plan that breaks down complex instructions into a sequence of simpler objectives, while ERS fills in reasoning contexts often overlooked by human annotators, creating a smoother reasoning trajectory for LLM fine-tuning. Unlike existing CoT prompting methods that generate reasoning chains only depending on LLM's internal knowledge, our method leverages human-annotated initial answers as ``meta-knowledge'' to help LLMs generate more detailed and precise reasoning processes, leading to a more trustworthy LLM expert for complex mathematical problems. In experiments, EIT achieves an accuracy of 84.1% on GSM8K and 32.5% on MATH, surpassing state-of-the-art fine-tuning and prompting methods, and even matching the performance of tool-augmented methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.16964

Genre: Research Report > New Finding (0.68)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

DRIVE: Dual-Robustness via Information Variability and Entropic Consistency in Source-Free Unsupervised Domain Adaptation

Xiao, Ruiqiang, Lai, Songning, Yang, Yijun, Wu, Jiemin, Yue, Yutao, Zhu, Lei

arXiv.org Artificial IntelligenceDec-23-2024

Adapting machine learning models to new domains without labeled data, especially when source data is inaccessible, is a critical challenge in applications like medical imaging, autonomous driving, and remote sensing. This task, known as Source-Free Unsupervised Domain Adaptation (SFUDA), involves adapting a pre-trained model to a target domain using only unlabeled target data, which can lead to issues such as overfitting, underfitting, and poor generalization due to domain discrepancies and noise. Existing SFUDA methods often rely on single-model architectures, struggling with uncertainty and variability in the target domain. To address these challenges, we propose DRIVE (Dual-Robustness through Information Variability and Entropy), a novel SFUDA framework leveraging a dual-model architecture. The two models, initialized with identical weights, work in parallel to capture diverse target domain characteristics. One model is exposed to perturbations via projection gradient descent (PGD) guided by mutual information, focusing on high-uncertainty regions. We also introduce an entropy-aware pseudo-labeling strategy that adjusts label weights based on prediction uncertainty, ensuring the model focuses on reliable data while avoiding noisy regions. The adaptation process has two stages: the first aligns the models on stable features using a mutual information consistency loss, and the second dynamically adjusts the perturbation level based on the loss from the first stage, encouraging the model to explore a broader range of the target domain while preserving existing performance. This enhances generalization capabilities and robustness against interference. Evaluations on standard SFUDA benchmarks show that DRIVE consistently outperforms previous methods, delivering improved adaptation accuracy and stability across complex target domains.

adaptation, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.15976

Genre: Research Report (0.40)

Industry:

Health & Medicine > Health Care Technology (0.34)
Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

From An LLM Swarm To A PDDL-Empowered HIVE: Planning Self-Executed Instructions In A Multi-Modal Jungle

Vyas, Kaustubh, Graux, Damien, Yang, Yijun, Montella, Sébastien, Diao, Chenxin, Zhou, Wendi, Vougiouklis, Pavlos, Lai, Ruofei, Ren, Yang, Li, Keshuang, Pan, Jeff Z.

arXiv.org Artificial IntelligenceDec-17-2024

In response to the call for agent-based solutions that leverage the ever-increasing capabilities of the deep models' ecosystem, we introduce Hive -- a comprehensive solution for selecting appropriate models and subsequently planning a set of atomic actions to satisfy the end-users' instructions. Hive operates over sets of models and, upon receiving natural language instructions (i.e. user queries), schedules and executes explainable plans of atomic actions. These actions can involve one or more of the available models to achieve the overall task, while respecting end-users specific constraints. Notably, Hive handles tasks that involve multi-modal inputs and outputs, enabling it to handle complex, real-world queries. Our system is capable of planning complex chains of actions while guaranteeing explainability, using an LLM-based formal logic backbone empowered by PDDL operations. We introduce the MuSE benchmark in order to offer a comprehensive evaluation of the multi-modal capabilities of agent systems. Our findings show that our framework redefines the state-of-the-art for task selection, outperforming other competing systems that plan operations across multiple models while offering transparency guarantees while fully adhering to user constraints.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.12839

Country:

North America > United States (0.69)
Asia (0.67)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents

Zhou, Siyu, Zhou, Tianyi, Yang, Yijun, Long, Guodong, Ye, Deheng, Jiang, Jing, Zhang, Chengqi

arXiv.org Artificial IntelligenceOct-11-2024

Step 1-2: the agent makes a plan via MPC with the initial unaligned world model, resulting in a failed action for mining iron ore. Step 3: by comparing real trajectories with the world model predictions, WALL-E learns a critical rule that if the tool is not proper to the material being mined, the action will fail. Step 4-5: the learned rule helps the world model make accurate predictions for transitions that were predicted mistakenly in MPC. Step 6: the agent accordingly modifies its plan and replaces stone pickaxe with an iron pickaxe toward completing the task. Can large language models (LLMs) directly serve as powerful world models for modelbased agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by aligning an LLM with its deployed environment and such "world alignment" can be efficiently achieved by rule learning on LLMs. Given the rich prior knowledge of LLMs, only a few additional rules suffice to align LLM predictions with the specified environment dynamics. To this end, we propose a neurosymbolic approach to learn these rules gradient-free through LLMs, by inducing, updating, and pruning rules based on comparisons of agent-explored trajectories and world model predictions. Our embodied LLM agent "WALL-E" is built upon model-predictive control (MPC). By optimizing look-ahead actions based on the precise world model, MPC significantly improves exploration and learning efficiency. Compared to existing LLM agents, WALL-E's reasoning only requires a few principal rules rather than verbose buffered trajectories being included in the LLM input. On open-world challenges in Minecraft and ALFWorld, WALL-E achieves higher success rates than existing methods, with lower costs on replanning time and the number of tokens used for reasoning. In Minecraft, WALL-E exceeds baselines by 15-30% in success rate while costing 8-20 fewer replanning rounds and only 60-80% of tokens. This leads to safety risks agent's action per step is controlled by and suboptimality of generated trajectories.

large language model, natural language, world model, (17 more...)

arXiv.org Artificial Intelligence

2410.07484

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing

Yang, Yijun, He, Jie, Chen, Pinzhen, Gutiérrez-Basulto, Víctor, Pan, Jeff Z.

arXiv.org Artificial IntelligenceApr-1-2024

Several recent papers have investigated the potential of language models as knowledge bases as well as the existence of severe biases when extracting factual knowledge. In this work, we focus on the factual probing performance over unseen prompts from tuning, and using a probabilistic view we show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge. We hypothesize that simultaneously debiasing these objectives can be the key to generalisation over unseen prompts. We propose an adapter-based framework, UniArk, for generalised and consistent factual knowledge extraction through simple methods without introducing extra parameters. Extensive experiments show that UniArk can significantly improve the model's out-of-domain generalisation as well as consistency under various prompts. Additionally, we construct ParaTrex, a large-scale and diverse dataset for measuring the inconsistency and out-of-domain generation of models. Further, ParaTrex offers a reference method for constructing paraphrased datasets using large language models.

machine learning, natural language, template, (22 more...)

arXiv.org Artificial Intelligence

2404.01253

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.66)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback