AITopics | Ye, Jiayi

Collaborating Authors

Ye, Jiayi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Breaking Focus: Contextual Distraction Curse in Large Language Models

Huang, Yue, Wang, Yanbo, Xu, Zixiang, Gao, Chujie, Wu, Siyuan, Ye, Jiayi, Chen, Xiuying, Chen, Pin-Yu, Zhang, Xiangliang

arXiv.org Artificial IntelligenceFeb-3-2025

Large Language Models (LLMs) (Zhou et al., 2023b) have demonstrated remarkable capabilities across various Natural Language Processing (NLP) tasks, revolutionizing wide downstream applications such as medicine (Zhao et al., 2023), education (Kasneci et al., 2023), and science (Li et al., 2024b; Guo et al., 2023; Huang et al., 2024e). Despite their impressive performance, recent studies have exposed various vulnerabilities in LLMs, including susceptibility to jailbreaking attacks (Zou et al., 2023), hallucination issues (Xu et al., 2024b), and consistency problems (Liang et al., 2024; Huang et al., 2024a). These vulnerabilities highlight the limitations of LLMs in handling nuanced and adversarial scenarios, making it critical to uncover and analyze additional weaknesses to improve their reliability. In this work, we investigate a novel vulnerability termed Contextual Distraction Vulnerability (CDV), where semantically coherent but non-essential contextual additions to a question degrade LLM performance. For instance, a customer service chatbot might miss a refund request hidden in a short story about discovering products through social media influencers. Similarly, a technical query about machine learning could be misunderstood if it's preceded by a student's emotional account of exam preparation anxiety. Unlike adversarial attacks that inject semantically meaningless noise into inputs (Zou et al., 2023; Shi et al., 2024) and distraction brought by long-context input (Bai et al., 2023), for CDV, our study demonstrates that semantically coherent without a long context yet contextually distracting modifications are sufficient to disrupt the decision-making process of even the most advanced LLMs. This vulnerability underscores a critical weakness in LLMs' ability to filter out irrelevant information and prioritize core knowledge, which is essential for robust reasoning. Recent studies have demonstrated the powerful generative capabilities of LLM Xu et al. (2024a); Wu et al. (2024), To systematically investigate this vulnerability, we propose a methodology for

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.01609

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Large Action Models: From Inception to Implementation

Wang, Lu, Yang, Fangkai, Zhang, Chaoyun, Lu, Junting, Qian, Jiaxu, He, Shilin, Zhao, Pu, Qiao, Bo, Huang, Ray, Qin, Si, Su, Qisheng, Ye, Jiayi, Zhang, Yudi, Lou, Jian-Guang, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceJan-13-2025

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications. The code for the data collection process utilized in this paper is publicly available at: https://github.com/microsoft/UFO/tree/main/dataflow, and comprehensive documentation can be found at https://microsoft.github.io/UFO/dataflow/overview/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.10047

Genre:

Workflow (1.00)
Instructional Material (1.00)

Industry:

Information Technology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

Bao, Han, Huang, Yue, Wang, Yanbo, Ye, Jiayi, Wang, Xiangqi, Chen, Xiuying, Elhoseiny, Mohamed, Zhang, Xiangliang

arXiv.org Artificial IntelligenceOct-29-2024

Large Vision-Language Models (LVLMs) have become essential for advancing the integration of visual and linguistic information, facilitating a wide range of complex applications and tasks. However, the evaluation of LVLMs presents significant challenges as the evaluation benchmark always demands lots of human cost for its construction, and remains static, lacking flexibility once constructed. Even though automatic evaluation has been explored in textual modality, the visual modality remains under-explored. As a result, in this work, we address a question: "Can LVLMs serve as a path to automatic benchmarking?". We introduce AutoBench-V, an automated framework for serving evaluation on demand, i.e., benchmarking LVLMs based on specific aspects of model capability. Upon receiving an evaluation capability, AutoBench-V leverages text-to-image models to generate relevant image samples and then utilizes LVLMs to orchestrate visual question-answering (VQA) tasks, completing the evaluation process efficiently and flexibly. Through an extensive evaluation of seven popular LVLMs across five demanded user inputs (i.e., evaluation capabilities), the framework shows effectiveness and reliability. We observe the following: (1) Our constructed benchmark accurately reflects varying task difficulties; (2) As task difficulty rises, the performance gap between models widens; (3) While models exhibit strong performance in abstract level understanding, they underperform in details reasoning tasks; and (4) Constructing a dataset with varying levels of difficulties is critical for a comprehensive and exhaustive evaluation. Overall, AutoBench-V not only successfully utilizes LVLMs for automated benchmarking but also reveals that LVLMs as judges have significant potential in various domains.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.21259

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)

Add feedback

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

Ye, Jiayi, Wang, Yanbo, Huang, Yue, Chen, Dongping, Zhang, Qihui, Moniz, Nuno, Gao, Tian, Geyer, Werner, Huang, Chao, Chen, Pin-Yu, Chawla, Nitesh V, Zhang, Xiangliang

arXiv.org Artificial IntelligenceOct-3-2024

LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework-CALM-which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.02736

Country: Asia (0.45)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Archiving Body Movements: Collective Generation of Chinese Calligraphy

Zhou, Aven Le, Ye, Jiayi, Liu, Tianchen, Zhang, Kang

arXiv.org Artificial IntelligenceNov-27-2023

As a communication channel, body movements have been widely explored in behavioral studies and kinesics. Performing and visual arts share the same interests but focus on documenting and representing human body movements, such as for dance notation and visual work creation. This paper investigates body movements in oriental calligraphy and how to apply calligraphy principles to stimulate and archive body movements. Through an artwork (Wushu), the authors experiment with an interactive and generative approach to engage the audience's bodily participation and archive the body movements as a compendium of generated calligraphy. The audience assumes the role of both writers and readers; creating ("writing") and appreciating ("reading") the generated calligraphy becomes a cyclical process within this infinite "Book," which can motivate further attention and discussions concerning Chinese characters and calligraphy.

artificial intelligence, natural language, survey article, (17 more...)

arXiv.org Artificial Intelligence

2311.1377

Country:

Asia > China (0.46)
North America > United States > California (0.14)

Genre:

Research Report (0.90)
Overview (0.54)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback