AITopics | Liu, Zhengying

Collaborating Authors

Liu, Zhengying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FormalAlign: Automated Alignment Evaluation for Autoformalization

Lu, Jianqiao, Wan, Yingjia, Huang, Yinya, Xiong, Jing, Liu, Zhengying, Guo, Zhijiang

arXiv.org Artificial IntelligenceOct-13-2024

Autoformalization aims to convert informal mathematical proofs into machine-verifiable formats, bridging the gap between natural and formal languages. However, ensuring semantic alignment between the informal and formalized statements remains challenging. Existing approaches heavily rely on manual verification, hindering scalability. To address this, we introduce \textsc{FormalAlign}, the first automated framework designed for evaluating the alignment between natural and formal languages in autoformalization. \textsc{FormalAlign} trains on both the autoformalization sequence generation task and the representational alignment between input and output, employing a dual loss that combines a pair of mutually enhancing autoformalization and alignment tasks. Evaluated across four benchmarks augmented by our proposed misalignment strategies, \textsc{FormalAlign} demonstrates superior performance. In our experiments, \textsc{FormalAlign} outperforms GPT-4, achieving an Alignment-Selection Score 11.58\% higher on \forml-Basic (99.21\% vs. 88.91\%) and 3.19\% higher on MiniF2F-Valid (66.39\% vs. 64.34\%). This effective alignment evaluation significantly reduces the need for manual verification. Both the dataset and code can be accessed via~\url{https://github.com/rookie-joe/FormalAlign}.

large language model, logic & formal reasoning, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2410.10135

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ToolACE: Winning the Points of LLM Function Calling

Liu, Weiwen, Huang, Xu, Zeng, Xingshan, Hao, Xinlong, Yu, Shuai, Li, Dexun, Wang, Shuai, Gan, Weinan, Liu, Zhengying, Yu, Yuanqing, Wang, Zezhong, Wang, Yuxian, Ning, Wu, Hou, Yutai, Wang, Bin, Wu, Chuhan, Wang, Xinzhi, Liu, Yong, Wang, Yasheng, Tang, Duyu, Tu, Dandan, Shang, Lifeng, Jiang, Xin, Tang, Ruiming, Lian, Defu, Liu, Qun, Chen, Enhong

arXiv.org Artificial IntelligenceSep-1-2024

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.0092

Country:

North America > United States > Minnesota (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports (0.46)
Media > Music (0.46)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

Lin, Xiaohan, Cao, Qingxing, Huang, Yinya, Wang, Haiming, Lu, Jianqiao, Liu, Zhengying, Song, Linqi, Liang, Xiaodan

arXiv.org Artificial IntelligenceJun-20-2024

Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as another line of rigorous verification, are maintained with comprehensive rules and theorems. In this paper, we propose FVEL, an interactive Formal Verification Environment with LLMs. Specifically, FVEL transforms a given code to be verified into Isabelle, and then conducts verification via neural automated theorem proving with an LLM. The joined paradigm leverages the rigorous yet abundant formulated and organized rules in Isabelle and is also convenient for introducing and adjusting cutting-edge LLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER dataset includes code dependencies and verification processes that are formulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646 proof steps in total with in-depth dependencies. We benchmark FVELER in the FVEL environment by first fine-tuning LLMs with FVELER and then evaluating them on Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned Llama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 -> 84) more problems in SV-COMP. And the proportion of proof errors is reduced. Project page: https://fveler.github.io/.

large language model, logic & formal reasoning, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2406.14408

Country:

North America > United States (0.68)
Europe > France (0.46)

Genre: Research Report > New Finding (0.48)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Process-Driven Autoformalization in Lean 4

Lu, Jianqiao, Liu, Zhengying, Wan, Yingjia, Huang, Yinya, Wang, Haiming, Yang, Zhicheng, Tang, Jing, Guo, Zhijiang

arXiv.org Artificial IntelligenceJun-3-2024

Autoformalization, the conversion of natural language mathematics into formal languages, offers significant potential for advancing mathematical reasoning. However, existing efforts are limited to formal languages with substantial online corpora and struggle to keep pace with rapidly evolving languages like Lean 4. To bridge this gap, we propose a new benchmark \textbf{Form}alization for \textbf{L}ean~\textbf{4} (\textbf{\name}) designed to evaluate the autoformalization capabilities of large language models (LLMs). This benchmark encompasses a comprehensive assessment of questions, answers, formal statements, and proofs. Additionally, we introduce a \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{PSV}) model that leverages the precise feedback from Lean 4 compilers to enhance autoformalization. Our experiments demonstrate that the PSV method improves autoformalization, enabling higher accuracy using less filtered training data. Furthermore, when fine-tuned with data containing detailed process information, PSV can leverage the data more effectively, leading to more significant improvements in autoformalization for Lean 4. Our dataset and code are available at \url{https://github.com/rookie-joe/PDA}.

large language model, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2406.0194

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

Proving Theorems Recursively

Wang, Haiming, Xin, Huajian, Liu, Zhengying, Li, Wenda, Huang, Yinya, Lu, Jianqiao, Yang, Zhicheng, Tang, Jing, Yin, Jian, Li, Zhenguo, Liang, Xiaodan

arXiv.org Artificial IntelligenceMay-23-2024

Recent advances in automated theorem proving leverages language models to explore expanded search spaces by step-by-step proof generation. However, such approaches are usually based on short-sighted heuristics (e.g., log probability or value function scores) that potentially lead to suboptimal or even distracting subgoals, preventing us from finding longer proofs. To address this challenge, we propose POETRY (PrOvE Theorems RecursivelY), which proves theorems in a recursive, level-by-level manner in the Isabelle theorem prover. Unlike previous step-by-step methods, POETRY searches for a verifiable sketch of the proof at each level and focuses on solving the current level's theorem or conjecture. Detailed proofs of intermediate conjectures within the sketch are temporarily replaced by a placeholder tactic called sorry, deferring their proofs to subsequent levels. This approach allows the theorem to be tackled incrementally by outlining the overall theorem at the first level and then solving the intermediate conjectures at deeper levels. Experiments are conducted on the miniF2F and PISA datasets and significant performance gains are observed in our POETRY approach over state-of-the-art methods. POETRY on miniF2F achieves an average proving success rate improvement of 5.1%. Moreover, we observe a substantial increase in the maximum proof length found by POETRY, from 10 to 26.

large language model, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2405.14414

Country:

Europe > Germany (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Huang, Yinya, Lin, Xiaohan, Liu, Zhengying, Cao, Qingxing, Xin, Huajian, Wang, Haiming, Li, Zhenguo, Song, Linqi, Liang, Xiaodan

arXiv.org Artificial IntelligenceMay-22-2024

Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectiveness of intermediate steps guidance. However, such step-wise annotation requires heavy labor, leading to insufficient training steps for current benchmarks. To fill this gap, this work introduces MUSTARD, a data generation framework that masters uniform synthesis of theorem and proof data of high quality and diversity. MUSTARD synthesizes data in three stages: (1) It samples a few mathematical concept seeds as the problem category. (2) Then, it prompts a generative language model with the sampled concepts to obtain both the problems and their step-wise formal solutions. (3) Lastly, the framework utilizes a proof assistant (e.g., Lean Prover) to filter the valid proofs. With the proposed MUSTARD, we present a theorem-and-proof benchmark MUSTARDSAUCE with 5,866 valid data points. Each data point contains an informal statement, an informal proof, and a translated formal proof that passes the prover validation. We perform extensive analysis and demonstrate that MUSTARD generates validated high-quality step-by-step data. We further apply the MUSTARDSAUCE for fine-tuning smaller language models. The fine-tuned Llama 2-7B achieves a 15.41% average relative performance gain in automated theorem proving, and 8.18% in math word problems. Codes and data are available at https://github.com/Eleanor-H/MUSTARD.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.08957

Country:

Europe (0.67)
Asia > China (0.28)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education > Educational Setting > K-12 Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ATG: Benchmarking Automated Theorem Generation for Generative Language Models

Lin, Xiaohan, Cao, Qingxing, Huang, Yinya, Yang, Zhicheng, Liu, Zhengying, Li, Zhenguo, Liang, Xiaodan

arXiv.org Artificial IntelligenceMay-4-2024

Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses with the exponentially growing search space. Therefore, this paper proposes an Automated Theorem Generation (ATG) benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems that are applicable for downstream theorem proving as reusable knowledge. Specifically, we construct the ATG benchmark by Figure 1: An example theorem generated by GPT-4 splitting the Metamath library into three sets: (OpenAI, 2023). GPT-4 wrongly refers to the intermediate axioms, library, and problem based on their theorem (A (B A) A A) as proving depth. We conduct extensive experiments ((A B) (A A)). In Step 4, it applies "ax-to investigate whether current LMs can 1" but obtains the wrong expression instead of correct generate theorems in the library and benefit (A (B A)) and can not derive (A A) even the problem theorems proving. The results with the incorrect Steps 4 and 5. demonstrate that high-quality ATG data facilitates models' performances on downstream

large language model, logic & formal reasoning, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2405.06677

Country:

Asia > China > Guangdong Province (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:

Workflow (0.55)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Chen, Kai, Wang, Chunwei, Yang, Kuo, Han, Jianhua, Hong, Lanqing, Mi, Fei, Xu, Hang, Liu, Zhengying, Huang, Wenyong, Li, Zhenguo, Yeung, Dit-Yan, Shang, Lifeng, Jiang, Xin, Liu, Qun

arXiv.org Artificial IntelligenceFeb-4-2024

The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. This becomes particularly evident when LLMs inadvertently generate harmful or toxic content, either unintentionally or because of intentional inducement. Conversely, this study proposes a novel alignment technique based on mistake analysis, which deliberately exposes LLMs to erroneous content to learn the reasons for mistakes and how to avoid them. In this case, mistakes are repurposed into valuable data for alignment, effectively helping to avoid the production of erroneous responses. Without external models or human annotations, our method leverages a model's intrinsic ability to discern undesirable mistakes and improves the safety of its generated responses. Experimental results reveal that our method outperforms existing alignment approaches in enhancing model safety while maintaining the overall utility.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.10477

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

A Survey of Reasoning with Foundation Models

Sun, Jiankai, Zheng, Chuanyang, Xie, Enze, Liu, Zhengying, Chu, Ruihang, Qiu, Jianing, Xu, Jiaqi, Ding, Mingyu, Li, Hongyang, Geng, Mengzhe, Wu, Yue, Wang, Wenhai, Chen, Junsong, Yin, Zhangyue, Ren, Xiaozhe, Fu, Jie, He, Junxian, Yuan, Wu, Liu, Qi, Liu, Xihui, Li, Yu, Dong, Hao, Cheng, Yu, Zhang, Ming, Heng, Pheng Ann, Dai, Jifeng, Luo, Ping, Wang, Jingdong, Wen, Ji-Rong, Qiu, Xipeng, Guo, Yike, Xiong, Hui, Liu, Qun, Li, Zhenguo

arXiv.org Artificial IntelligenceJan-25-2024

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.

artificial general intelligence, large language model, machine learning, (29 more...)

arXiv.org Artificial Intelligence

2312.11562

Country:

Europe (1.00)
Asia > China (1.00)
Asia > Middle East (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
(2 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(13 more...)

Add feedback

FIMO: A Challenge Formal Dataset for Automated Theorem Proving

Liu, Chengwu, Shen, Jianhao, Xin, Huajian, Liu, Zhengying, Yuan, Ye, Wang, Haiming, Ju, Wei, Zheng, Chuanyang, Yin, Yichun, Li, Lin, Zhang, Ming, Liu, Qun

arXiv.org Artificial IntelligenceDec-5-2023

We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes.

large language model, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2309.04295

Country:

North America > United States (0.14)
North America > Canada (0.14)
Africa > Rwanda (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback