AITopics | Zhang, Yikai

Collaborating Authors

Zhang, Yikai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement

Yang, Ruihan, Ye, Fanghua, Li, Jian, Yuan, Siyu, Zhang, Yikai, Tu, Zhaopeng, Li, Xiaolong, Yang, Deqing

arXiv.org Artificial IntelligenceMar-20-2025

Large language models (LLMs) have recently transformed from text-based assistants to autonomous agents capable of planning, reasoning, and iteratively improving their actions. While numerical reward signals and verifiers can effectively rank candidate actions, they often provide limited contextual guidance. In contrast, natural language feedback better aligns with the generative capabilities of LLMs, providing richer and more actionable suggestions. However, parsing and implementing this feedback effectively can be challenging for LLM-based agents. In this work, we introduce Critique-Guided Improvement (CGI), a novel two-player framework, comprising an actor model that explores an environment and a critic model that generates detailed nature language feedback. By training the critic to produce fine-grained assessments and actionable revisions, and the actor to utilize these critiques, our approach promotes more robust exploration of alternative strategies while avoiding local optima. Experiments in three interactive environments show that CGI outperforms existing baselines by a substantial margin. Notably, even a small critic model surpasses GPT-4 in feedback quality. The resulting actor achieves state-of-the-art performance, demonstrating the power of explicit iterative guidance to enhance decision-making in LLM-based agents.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.16024

Country:

Asia (0.68)
North America > Mexico > Mexico City (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention

Chen, Lida, Xu, Dong, An, Chenxin, Wang, Xintao, Zhang, Yikai, Chen, Jiangjie, Liang, Zujie, Wei, Feng, Liang, Jiaqing, Xiao, Yanghua, Wang, Wei

arXiv.org Artificial IntelligenceMar-5-2025

Large Language Models (LLMs) face efficiency bottlenecks due to the quadratic complexity of the attention mechanism when processing long contexts. Sparse attention methods offer a promising solution, but existing approaches often suffer from incomplete effective context and/or require complex implementation of pipeline. We present a comprehensive analysis of sparse attention for autoregressive LLMs from the respective of receptive field, recognize the suboptimal nature of existing methods for expanding the receptive field, and introduce PowerAttention, a novel sparse attention design that facilitates effective and complete context extension through the theoretical analysis. PowerAttention achieves exponential receptive field growth in $d$-layer LLMs, allowing each output token to attend to $2^d$ tokens, ensuring completeness and continuity of the receptive field. Experiments demonstrate that PowerAttention outperforms existing static sparse attention methods by $5\sim 40\%$, especially on tasks demanding long-range dependencies like Passkey Retrieval and RULER, while maintaining a comparable time complexity to sliding window attention. Efficiency evaluations further highlight PowerAttention's superior speedup in both prefilling and decoding phases compared with dynamic sparse attentions and full attention ($3.0\times$ faster on 128K context), making it a highly effective and user-friendly solution for processing long sequences in LLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.03588

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Hawaii (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > Mexico > Mexico City (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling

Chen, Aili, Du, Chengyu, Chen, Jiangjie, Xu, Jinghan, Zhang, Yikai, Yuan, Siyu, Chen, Zulong, Li, Liangyue, Xiao, Yanghua

arXiv.org Artificial IntelligenceFeb-16-2025

To advance personalized applications such as recommendation systems and user behavior prediction, recent research increasingly adopts large language models (LLMs) for human -readable persona modeling. In dynamic real -world scenarios, effective persona modeling necessitates leveraging streaming behavior data to continually optimize user personas. However, existing methods -whether regenerating personas or incrementally extending them with new behaviors -often fail to achieve sustained improvements in persona quality or future behavior prediction accuracy. To address this, we propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization. Specifically, we enhance the model's direction -search capability through an iterative reinforcement learning framework, allowing it to automatically identify effective update directions and optimize personas using discrepancies between user behaviors and model predictions. Extensive experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER, delivering an impressive 32.2% average reduction in user behavior prediction error over four update rounds -outperforming the best baseline by a remarkable 22.92%.

artificial intelligence, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2502.11078

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.53)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Multi-agent KTO: Reinforcing Strategic Interactions of Large Language Model in Language Game

Ye, Rong, Zhang, Yongxin, Zhang, Yikai, Kuang, Haoyu, Wei, Zhongyu, Sun, Peng

arXiv.org Artificial IntelligenceJan-23-2025

Achieving Artificial General Intelligence (AGI) requires AI agents that can not only make stratigic decisions but also engage in flexible and meaningful communication. Inspired by Wittgenstein's language game theory in Philosophical Investigations, we propose that language agents can learn through in-context interaction rather than traditional multi-stage frameworks that separate decision-making from language expression. Using Werewolf, a social deduction game that tests language understanding, strategic interaction, and adaptability, we develop the Multi-agent Kahneman & Tversky's Optimization (MaKTO). MaKTO engages diverse models in extensive gameplay to generate unpaired desirable and unacceptable responses, then employs KTO to refine the model's decision-making process. In 9-player Werewolf games, MaKTO achieves a 61% average win rate across various models, outperforming GPT-4o and two-stage RL agents by relative improvements of 23.0% and 10.9%, respectively. Notably, MaKTO also demonstrates human-like performance, winning 60% against expert players and showing only 49% detectability in Turing-style blind tests. These results showcase MaKTO's superior decision-making, strategic adaptation, and natural language generation in complex social deduction games.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.14225

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Reweighting Improves Conditional Risk Bounds

Zhang, Yikai, Lin, Jiahe, Li, Fengpei, Zheng, Songzhu, Raj, Anant, Schneider, Anderson, Nevmyvaka, Yuriy

arXiv.org Machine LearningJan-4-2025

In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.

artificial intelligence, machine learning, theorem 4, (18 more...)

arXiv.org Machine Learning

2501.02353

Country:

North America > United States (0.14)
Europe > Hungary (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
(2 more...)

Add feedback

Revealing the Barriers of Language Agents in Planning

Xie, Jian, Zhang, Kexun, Chen, Jiangjie, Yuan, Siyu, Zhang, Kai, Zhang, Yikai, Li, Lei, Xiao, Yanghua

arXiv.org Artificial IntelligenceOct-16-2024

Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.12409

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

Huang, Zhen, Wang, Zengzhi, Xia, Shijie, Li, Xuefeng, Zou, Haoyang, Xu, Ruijie, Fan, Run-Ze, Ye, Lyumanshan, Chern, Ethan, Ye, Yixin, Zhang, Yikai, Yang, Yuqing, Wu, Ting, Wang, Binjie, Sun, Shichao, Xiao, Yang, Li, Yiyuan, Zhou, Fan, Chern, Steffi, Qin, Yiwei, Ma, Yan, Su, Jiadi, Liu, Yixiu, Zheng, Yuxiang, Zhang, Shaoting, Lin, Dahua, Qiao, Yu, Liu, Pengfei

arXiv.org Artificial IntelligenceJun-18-2024

The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problemsolving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics), illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equipping it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.12753

Country: Asia > China (0.46)

Genre: Research Report (0.81)

Industry:

Health & Medicine (0.68)
Education > Curriculum > Subject-Specific Education (0.67)
Leisure & Entertainment > Sports > Olympic Games (0.54)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

Gu, Zhouhong, Zhang, Lin, Zhu, Xiaoxuan, Chen, Jiangjie, Huang, Wenhao, Zhang, Yikai, Wang, Shusen, Ye, Zheyu, Gao, Yan, Feng, Hongwei, Xiao, Yanghua

arXiv.org Artificial IntelligenceJun-18-2024

Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice questions, with an average of 994 tokens per question. Each question contains an average of 4.55 pieces of implicit evidence, and solving the problem typically requires 7.62 logical jumps to find the correct answer. To enhance the performance of LLMs in evidence detection, this paper proposes Detective Reasoning Prompt and Finetune. Experiments demonstrate that the existing LLMs' abilities to detect evidence in long contexts are far inferior to humans. However, the Detective Reasoning Prompt effectively enhances the capability of powerful LLMs in evidence detection, while the Finetuning method shows significant effects in enhancing the performance of weaker LLMs. Moreover, when the abilities of LLMs in evidence detection are improved, their final reasoning performance is also enhanced accordingly.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.12641

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Basketball (0.67)
Education (0.48)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models

Zhang, Yikai, He, Qianyu, Wang, Xintao, Yuan, Siyu, Liang, Jiaqing, Xiao, Yanghua

arXiv.org Artificial IntelligenceJun-16-2024

Multi-Modal Knowledge Graphs (MMKGs) have proven valuable for various downstream tasks. However, scaling them up is challenging because building large-scale MMKGs often introduces mismatched images (i.e., noise). Most entities in KGs belong to the long tail, meaning there are few images of them available online. This scarcity makes it difficult to determine whether a found image matches the entity. To address this, we draw on the Triangle of Reference Theory and suggest enhancing vision-language models with concept guidance. Specifically, we introduce COG, a two-stage framework with COncept-Guided vision-language models. The framework comprises a Concept Integration module, which effectively identifies image-text pairs of long-tailed entities, and an Evidence Fusion module, which offers explainability and enables human verification. To demonstrate the effectiveness of COG, we create a dataset of 25k image-text pairs of long-tailed entities. Our comprehensive experiments show that COG not only improves the accuracy of recognizing long-tailed image-text pairs compared to baselines but also offers flexibility and explainability.

artificial intelligence, long-tailed entity, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.10902

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

Yang, Ruihan, Chen, Jiangjie, Zhang, Yikai, Yuan, Siyu, Chen, Aili, Richardson, Kyle, Xiao, Yanghua, Yang, Deqing

arXiv.org Artificial IntelligenceJun-7-2024

Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and in adapting to environments where feedback is delayed. In this paper, we present SelfGoal, a novel automatic approach designed to enhance agents' capabilities to achieve high-level goals with limited human prior and environmental feedback. The core concept of SelfGoal involves adaptively breaking down a high-level goal into a tree structure of more practical subgoals during the interaction with environments while identifying the most useful subgoals and progressively updating this structure. Experimental results demonstrate that SelfGoal significantly enhances the performance of language agents across various tasks, including competitive, cooperative, and deferred feedback environments. Project page: https://selfgoal-agent.github.io.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.04784

Country:

North America > United States > Maryland (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback