AITopics | Chen, Yushuo

Collaborating Authors

Chen, Yushuo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models

Chen, Daoyuan, Huang, Yilun, Pan, Xuchen, Jiang, Nana, Wang, Haibin, Ge, Ce, Chen, Yushuo, Zhang, Wenhao, Ma, Zhijian, Zhang, Yilei, Huang, Jun, Lin, Wei, Li, Yaliang, Ding, Bolin, Zhou, Jingren

arXiv.org Artificial IntelligenceDec-23-2024

The burgeoning field of foundation models necessitates advanced data processing mechanisms capable of harnessing vast valuable data with varied types utilized by these models. Nevertheless, the current landscape presents unique challenges that traditional data processing frameworks cannot handle effectively, especially with multimodal intricacies. In response, we present Data-Juicer 2.0, a new system offering fruitful data processing capabilities backed by over a hundred operators spanning various modalities like text, image, audio, and video. With seamless compatibility and dedicated optimization to popular dataset hubs like Hugging Face and computing engines like Ray, Data-Juicer 2.0 enhances its predecessor in both usability, efficiency, and programmability. It features an easily accessible user interface layer that supports decoupled Python interactions, RESTful APIs, and conversational commands. Alongside this, it contains a core runtime layer optimized for adaptive execution and management across different dataset scales, processing demands, and computational environments, while shielding unnecessary system details. Extensive empirical evaluations demonstrate Data-Juicer 2.0's remarkable performance and scalability, highlighting its capability to efficiently process tens of billions of data samples with tens of thousands of CPU cores. The system is publicly available, actively maintained, and broadly adopted in diverse research endeavors, practical applications, and real-world products such as Alibaba Cloud PAI.

data mining, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2501.14755

Country:

Asia (0.46)
Europe > Romania (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Software (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(7 more...)

Add feedback

GenSim: A General Social Simulation Platform with Large Language Model based Agents

Tang, Jiakai, Gao, Heyang, Pan, Xuchen, Wang, Lei, Tan, Haoran, Gao, Dawei, Chen, Yushuo, Chen, Xu, Lin, Yankai, Li, Yaliang, Ding, Bolin, Zhou, Jingren, Wang, Jun, Wen, Ji-Rong

arXiv.org Artificial IntelligenceOct-9-2024

With the rapid advancement of large language models (LLMs), recent years have witnessed many promising studies on leveraging LLM-based agents to simulate human social behavior. While prior work has demonstrated significant potential across various domains, much of it has focused on specific scenarios involving a limited number of agents and has lacked the ability to adapt when errors occur during simulation. To overcome these limitations, we propose a novel LLM-agent-based simulation platform called \textit{GenSim}, which: (1) \textbf{Abstracts a set of general functions} to simplify the simulation of customized social scenarios; (2) \textbf{Supports one hundred thousand agents} to better simulate large-scale populations in real-world contexts; (3) \textbf{Incorporates error-correction mechanisms} to ensure more reliable and long-term simulations. To evaluate our platform, we assess both the efficiency of large-scale agent simulations and the effectiveness of the error-correction mechanisms. To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform based on LLM agents, promising to further advance the field of social science.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.0436

Country: Asia (0.14)

Genre:

Research Report > Experimental Study (0.69)
Research Report > Strength High (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.94)

Add feedback

LLMBox: A Comprehensive Library for Large Language Models

Tang, Tianyi, Hu, Yiwen, Li, Bingqian, Luo, Wenyang, Qin, Zijing, Sun, Haoxiang, Wang, Jiapeng, Xu, Shiyi, Cheng, Xiaoxue, Guo, Geyang, Peng, Han, Zheng, Bowen, Tang, Yiru, Min, Yingqian, Chen, Yushuo, Chen, Jie, Zhao, Yuanqian, Ding, Luran, Wang, Yuhao, Dong, Zican, Xia, Chunxuan, Li, Junyi, Zhou, Kun, Zhao, Wayne Xin, Wen, Ji-Rong

arXiv.org Artificial IntelligenceJul-7-2024

To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency. With our library, users can easily reproduce existing methods, train new models, and conduct comprehensive performance comparisons. To rigorously test LLMBox, we conduct extensive experiments in a diverse coverage of evaluation settings, and experimental results demonstrate the effectiveness and efficiency of our library in supporting various implementations related to LLMs. The detailed introduction and usage guidance can be found at https://github.com/RUCAIBox/LLMBox.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.05563

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.69)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models

Chen, Yushuo, Tang, Tianyi, Xiang, Erge, Li, Linjiang, Zhao, Wayne Xin, Wang, Jing, Chai, Yunpeng, Wen, Ji-Rong

arXiv.org Artificial IntelligenceApr-17-2024

In real world, large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications. For the wide application of LLMs, the inference efficiency is an essential concern, which has been widely studied in existing work, and numerous optimization algorithms and code libraries have been proposed to improve it. Nonetheless, users still find it challenging to compare the effectiveness of all the above methods and understand the underlying mechanisms. In this work, we perform a detailed coarse-to-fine analysis of the inference performance of various code libraries. To evaluate the overall effectiveness, we examine four usage scenarios within two practical applications. We further provide both theoretical and empirical fine-grained analyses of each module in the Transformer architecture. Our experiments yield comprehensive results that are invaluable for researchers to evaluate code libraries and improve inference strategies.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.11502

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

A Survey of Large Language Models

Zhao, Wayne Xin, Zhou, Kun, Li, Junyi, Tang, Tianyi, Wang, Xiaolei, Hou, Yupeng, Min, Yingqian, Zhang, Beichen, Zhang, Junjie, Dong, Zican, Du, Yifan, Yang, Chen, Chen, Yushuo, Chen, Zhipeng, Jiang, Jinhao, Ren, Ruiyang, Li, Yifan, Tang, Xinyu, Liu, Zikang, Liu, Peiyu, Nie, Jian-Yun, Wen, Ji-Rong

arXiv.org Artificial IntelligenceNov-24-2023

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, language modeling has been widely studied for language understanding and generation in the past two decades, evolving from statistical language models to neural language models. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged language models not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale language models. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2303.18223

Country:

Europe (1.00)
Asia (1.00)
South America (0.67)
(5 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education > Educational Setting (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Imagine: Visually-Augmented Natural Language Generation

Tang, Tianyi, Chen, Yushuo, Du, Yifan, Li, Junyi, Zhao, Wayne Xin, Wen, Ji-Rong

arXiv.org Artificial IntelligenceJun-15-2023

People often imagine relevant scenes to aid in the writing process. In this work, we aim to utilize visual information for composition in the same manner as humans. We propose a method, LIVE, that makes pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration. First, we imagine the scene based on the text: we use a diffusion model to synthesize high-quality images conditioned on the input texts. Second, we use CLIP to determine whether the text can evoke the imagination in a posterior way. Finally, our imagination is dynamic, and we conduct synthesis for each sentence rather than generate only one image for an entire paragraph. Technically, we propose a novel plug-and-play fusion layer to obtain visually-augmented representations for each text. Our vision-text fusion layer is compatible with Transformerbased architecture. We have conducted extensive experiments on four generation tasks using BART and T5, and the automatic results and human evaluation demonstrate the effectiveness of our proposed method. We will release the code, model, and data at the link: https://github.com/RUCAIBox/LIVE.

artificial intelligence, natural language, proceedings, (17 more...)

arXiv.org Artificial Intelligence

2305.16944

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (0.82)

Industry: Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Liu, Peiyu, Gao, Ze-Feng, Chen, Yushuo, Zhao, Wayne Xin, Wen, Ji-Rong

arXiv.org Artificial IntelligenceApr-10-2023

In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO). MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts: the major part that contains the major information (central tensor) and the supplementary part that only has a small proportion of parameters (auxiliary tensors). Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in reducing the model size and achieving highly competitive performance.

machine learning, natural language, tensor, (18 more...)

arXiv.org Artificial Intelligence

2303.16753

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback