AITopics | Liu, Yongfei

Collaborating Authors

Liu, Yongfei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FullStack Bench: Evaluating LLMs as Full Stack Coders

Bytedance-Seed-Foundation-Code-Team, null, :, null, Cheng, Yao, Chen, Jianfeng, Chen, Jie, Chen, Li, Chen, Liyu, Chen, Wentao, Chen, Zhengyu, Geng, Shijie, Li, Aoyan, Li, Bo, Li, Bowen, Li, Linyi, Liu, Boyi, Liu, Jerry, Liu, Kaibo, Liu, Qi, Liu, Shukai, Liu, Siyao, Liu, Tianyi, Liu, Tingkai, Liu, Yongfei, Long, Rui, Mai, Jing, Ning, Guanghan, Peng, Z. Y., Shen, Kai, Su, Jiahao, Su, Jing, Sun, Tao, Sun, Yifan, Tao, Yunzhe, Wang, Guoyin, Wang, Siwei, Wang, Xuwu, Wang, Yite, Wang, Zihan, Xia, Jinxiang, Xiang, Liang, Xiao, Xia, Xiao, Yongsheng, Xi, Chenguang, Xin, Shulin, Xu, Jingjing, Xu, Shikun, Yang, Hongxia, Yang, Jack, Yang, Yingxiang, Yuan, Jianbo, Zhang, Jun, Zhang, Yufeng, Zhang, Yuyu, Zheng, Shen, Zhu, He, Zhu, Ming

arXiv.org Artificial IntelligenceDec-20-2024

As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing. However, most existing datasets only evaluate limited application domains. To address this gap, we have developed a comprehensive code evaluation dataset FullStack Bench focusing on full-stack programming, which encompasses a wide range of application domains (e.g., basic programming, data analysis, software engineering, mathematics, and machine learning). Besides, to assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages to reflect real-world usage scenarios rather than simple translations. Moreover, we also release an effective code sandbox execution tool (i.e., SandboxFusion) supporting various programming languages and packages to evaluate the performance of our FullStack Bench efficiently. Comprehensive experimental results on our FullStack Bench demonstrate the necessity and effectiveness of our FullStack Bench and SandboxFusion.

large language model, machine learning, programming language, (18 more...)

arXiv.org Artificial Intelligence

2412.00535

Country: North America > United States (0.46)

Genre: Research Report (0.51)

Industry:

Education (0.67)
Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Liu, Zhihan, Zhang, Shenao, Liu, Yongfei, Liu, Boyi, Yang, Yingxiang, Wang, Zhaoran

arXiv.org Artificial IntelligenceDec-10-2024

Direct preference learning offers a promising and computation-efficient beyond supervised fine-tuning (SFT) for improving code generation in coding large language models (LMs). However, the scarcity of reliable preference data is a bottleneck for the performance of direct preference learning to improve the coding accuracy of code LMs. In this paper, we introduce \underline{\textbf{D}}irect Preference Learning with Only \underline{\textbf{S}}elf-Generated \underline{\textbf{T}}ests and \underline{\textbf{C}}ode (DSTC), a framework that leverages only self-generated code snippets and tests to construct reliable preference pairs such that direct preference learning can improve LM coding accuracy without external annotations. DSTC combines a minimax selection process and test-code concatenation to improve preference pair quality, reducing the influence of incorrect self-generated tests and enhancing model performance without the need for costly reward models. When applied with direct preference learning methods such as Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO), DSTC yields stable improvements in coding accuracy (pass@1 score) across diverse coding benchmarks, including HumanEval, MBPP, and BigCodeBench, demonstrating both its effectiveness and scalability for models of various sizes. This approach autonomously enhances code generation accuracy across LLMs of varying sizes, reducing reliance on expensive annotated coding datasets.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.13611

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization

Xu, Ruijie, Liu, Zhihan, Liu, Yongfei, Yan, Shipeng, Wang, Zhaoran, Zhang, Zhi, He, Xuming

arXiv.org Artificial IntelligenceOct-14-2024

We address the challenge of online Reinforcement Learning from Human Feedback (RLHF) with a focus on self-rewarding alignment methods. In online RLHF, obtaining feedback requires interaction with the environment, which can be costly when using additional reward models or the GPT-4 API. Current self-rewarding approaches rely heavily on the discriminator's judgment capabilities, which are effective for large-scale models but challenging to transfer to smaller ones. To address these limitations, we propose a novel, only-prompting self-rewarding online algorithm that generates preference datasets without relying on judgment capabilities. Additionally, we employ fine-grained arithmetic control over the optimality gap between positive and negative examples, generating more hard negatives in the later stages of training to help the model better capture subtle human preferences. Finally, we conduct extensive experiments on two base models, Mistral-7B and Mistral-Instruct-7B, which significantly bootstrap the performance of the reference model, achieving 34.5% in the Length-controlled Win Rates of AlpacaEval 2.0. Reinforcement Learning from Human Feedback (RLHF) is a prevalent technique for Large Language Model (LLM) alignment, ensuring models adhere to human preferences, produce useful and truthful responses, and prevent harmful ones (Stiennon et al., 2020; Ouyang et al., 2022; Christiano et al., 2017). Current RLHF methods are classified into online and offline approaches (Rafailov et al., 2024; Xiong et al., 2024; Meng et al., 2024).

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2409.17534

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Wang, Yiqi, Chen, Wentao, Han, Xiaotian, Lin, Xudong, Zhao, Haiteng, Liu, Yongfei, Zhai, Bohan, Yuan, Jianbo, You, Quanzeng, Yang, Hongxia

arXiv.org Artificial IntelligenceJan-18-2024

Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along with the emerging field of Multimodal Large Language Models (MLLMs), have demonstrated impressive capabilities across a wide range of multimodal tasks and applications. Particularly, various MLLMs, each with distinct model architectures, training data, and training stages, have been evaluated across a broad range of MLLM benchmarks. These studies have, to varying degrees, revealed different aspects of the current capabilities of MLLMs. However, the reasoning abilities of MLLMs have not been systematically investigated. In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions. We believe our survey establishes a solid base and sheds light on this important topic, multimodal reasoning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.06805

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Chen, Xiaohui, Liu, Yongfei, Yang, Yingxiang, Yuan, Jianbo, You, Quanzeng, Liu, Li-Ping, Yang, Hongxia

arXiv.org Artificial IntelligenceNov-28-2023

Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts. Despite the advancement, these diffusion models sometimes struggle to translate the semantic content from the text into images entirely. While conditioning on the layout has shown to be effective in improving the compositional ability of T2I diffusion models, they typically require manual layout input. In this work, we introduce a novel approach to improving T2I diffusion models using Large Language Models (LLMs) as layout generators. Our method leverages the Chain-of-Thought prompting of LLMs to interpret text and generate spatially reasonable object layouts. The generated layout is then used to enhance the generated images' composition and spatial accuracy. Moreover, we propose an efficient adapter based on a cross-attention mechanism, which explicitly integrates the layout information into the stable diffusion models. Our experiments demonstrate significant improvements in image quality and layout accuracy, showcasing the potential of LLMs in augmenting generative image models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.17126

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Grounded Image Text Matching with Mismatched Relation Reasoning

Wu, Yu, Wei, Yana, Wang, Haozhe, Liu, Yongfei, Yang, Sibei, He, Xuming

arXiv.org Artificial IntelligenceAug-4-2023

This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models. GITM-MR requires a model to first determine if an expression describes an image, then localize referred objects or ground the mismatched parts of the text. We provide a benchmark for evaluating pre-trained models on this task, with a focus on the challenging settings of limited data and out-of-distribution sentence lengths. Our evaluation demonstrates that pre-trained models lack data efficiency and length generalization ability. To address this, we propose the Relation-sensitive Correspondence Reasoning Network (RCRN), which incorporates relation-aware reasoning via bi-directional message propagation guided by language structure. RCRN can be interpreted as a modular program and delivers strong performance in both length generalization and data efficiency.

machine learning, natural language, relation, (21 more...)

arXiv.org Artificial Intelligence

2308.01236

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback