AITopics | Bai, Yuzhuo

Collaborating Authors

Bai, Yuzhuo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion

Luo, Kangyang, Bai, Yuzhuo, Gao, Cheng, Si, Shuzheng, Shen, Yingli, Liu, Zhu, Wang, Zhitong, Kong, Cunliang, Li, Wenhao, Huang, Yufei, Tian, Ye, Xiong, Xuantang, Han, Lei, Sun, Maosong

arXiv.org Artificial IntelligenceFeb-17-2025

Knowledge Graph Completion (KGC), which aims to infer missing or incomplete facts, is a crucial task for KGs. However, integrating the vital structural information of KGs into Large Language Models (LLMs) and outputting predictions deterministically remains challenging. To address this, we propose a new method called GLTW, which encodes the structural information of KGs and merges it with LLMs to enhance KGC performance. Specifically, we introduce an improved Graph Transformer (iGT) that effectively encodes subgraphs with both local and global structural information and inherits the characteristics of language model, bypassing training from scratch. Also, we develop a subgraph-based multi-classification training objective, using all entities within KG as classification objects, to boost learning efficiency.Importantly, we combine iGT with an LLM that takes KG language prompts as input.Our extensive experiments on various KG datasets show that GLTW achieves significant performance gains compared to SOTA baselines.

large language model, machine learning, relation, (16 more...)

arXiv.org Artificial Intelligence

2502.11471

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

Si, Shuzheng, Zhao, Haozhe, Chen, Gang, Gao, Cheng, Bai, Yuzhuo, Wang, Zhitong, An, Kaikai, Luo, Kangyang, Qian, Chen, Qi, Fanchao, Chang, Baobao, Sun, Maosong

arXiv.org Artificial IntelligenceFeb-16-2025

Training LLMs on data containing unfamiliar knowledge during the instruction tuning stage can encourage hallucinations. To address this challenge, we introduce NOVA, a novel framework designed to identify high-quality data that aligns well with the LLM's learned knowledge to reduce hallucinations. NOVA includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data. Specifically, ICP evaluates the LLM's understanding of the given instruction by calculating the tailored consistency among multiple self-generated responses. SEI further assesses the familiarity of the LLM with the target response by comparing it to the generated responses, using the proposed semantic clustering and well-designed voting strategy. Finally, to ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity. By considering data quality and avoiding unfamiliar data, we can utilize the selected data to effectively align LLMs to follow instructions and hallucinate less.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.0734

Country:

North America > Mexico (0.28)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Value Compass Leaderboard: A Platform for Fundamental and Validated Evaluation of LLMs Values

Yao, Jing, Yi, Xiaoyuan, Duan, Shitong, Wang, Jindong, Bai, Yuzhuo, Huang, Muhua, Zhang, Peng, Lu, Tun, Dou, Zhicheng, Sun, Maosong, Xie, Xing

arXiv.org Artificial IntelligenceJan-13-2025

As Large Language Models (LLMs) achieve remarkable breakthroughs, aligning their values with humans has become imperative for their responsible development and customized applications. However, there still lack evaluations of LLMs values that fulfill three desirable goals. (1) Value Clarification: We expect to clarify the underlying values of LLMs precisely and comprehensively, while current evaluations focus narrowly on safety risks such as bias and toxicity. (2) Evaluation Validity: Existing static, open-source benchmarks are prone to data contamination and quickly become obsolete as LLMs evolve. Additionally, these discriminative evaluations uncover LLMs' knowledge about values, rather than valid assessments of LLMs' behavioral conformity to values. (3) Value Pluralism: The pluralistic nature of human values across individuals and cultures is largely ignored in measuring LLMs value alignment. To address these challenges, we presents the Value Compass Leaderboard, with three correspondingly designed modules. It (i) grounds the evaluation on motivationally distinct \textit{basic values to clarify LLMs' underlying values from a holistic view; (ii) applies a \textit{generative evolving evaluation framework with adaptive test items for evolving LLMs and direct value recognition from behaviors in realistic scenarios; (iii) propose a metric that quantifies LLMs alignment with a specific value as a weighted sum over multiple dimensions, with weights determined by pluralistic values.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2501.07071

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

He, Chaoqun, Luo, Renjie, Bai, Yuzhuo, Hu, Shengding, Thai, Zhen Leng, Shen, Junhao, Hu, Jinyi, Han, Xu, Huang, Yujie, Zhang, Yuxiang, Liu, Jie, Qi, Lei, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceJun-6-2024

Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains. With traditional benchmarks becoming less challenging for these models, new rigorous challenges are essential to gauge their advanced abilities. In this work, we present OlympiadBench, an Olympiad-level bilingual multimodal scientific benchmark, featuring 8,476 problems from Olympiad-level mathematics and physics competitions, including the Chinese college entrance exam. Each problem is detailed with expert-level annotations for step-by-step reasoning. Evaluating top-tier models on OlympiadBench, we implement a comprehensive assessment methodology to accurately evaluate model responses. Notably, the best-performing model, GPT-4V, attains an average score of 17.97% on OlympiadBench, with a mere 10.74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning. Our analysis orienting GPT-4V points out prevalent issues with hallucinations, knowledge omissions, and logical fallacies. We hope that our challenging benchmark can serve as a valuable resource for helping future AGI research endeavors. The data and evaluation code are available at \url{https://github.com/OpenBMB/OlympiadBench}

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.14008

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > K-12 Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Huang, Yuzhen, Bai, Yuzhuo, Zhu, Zhihao, Zhang, Junlei, Zhang, Jinghan, Su, Tangjun, Liu, Junteng, Lv, Chuancheng, Zhang, Yikai, Lei, Jiayi, Fu, Yao, Sun, Maosong, He, Junxian

arXiv.org Artificial IntelligenceNov-6-2023

New NLP benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. C-Eval comprises multiple-choice questions across four difficulty levels: middle school, high school, college, and professional. The questions span 52 diverse disciplines, ranging from humanities to science and engineering. C-Eval is accompanied by C-Eval Hard, a subset of very challenging subjects in C-Eval that requires advanced reasoning abilities to solve. We conduct a comprehensive evaluation of the most advanced LLMs on C-Eval, including both English- and Chinese-oriented models. Results indicate that only GPT-4 could achieve an average accuracy of over 60%, suggesting that there is still significant room for improvement for current LLMs. We anticipate C-Eval will help analyze important strengths and shortcomings of foundation models, and foster their development and growth for Chinese users.

accuracy, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.08322

Country:

Europe (0.46)
Asia > China (0.29)
North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Setting (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback