AITopics | Wang, Zili

Collaborating Authors

Wang, Zili

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

Wu, Haoze, Qiu, Zihan, Wang, Zili, Zhao, Hang, Fu, Jie

arXiv.org Artificial IntelligenceJun-18-2024

Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty can lead to incorrect selections. Inspired by the Global Workspace Theory (GWT), we propose a new fine-tuning method, GW-MoE, to address this issue. The core idea is to broadcast the uncertain tokens across experts during fine-tuning. Therefore, these tokens can acquire the necessary knowledge from any expert during inference and become less sensitive to the choice. GW-MoE does not introduce additional inference overhead. We validate that GW can mitigate the uncertain problem and consistently improve in different tasks (text classification, question answering, summarization, code generation, and mathematical problem solving) and model sizes (650M and 8B parameters).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.12375

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts

Zhao, Hao, Qiu, Zihan, Wu, Huijia, Wang, Zili, He, Zhaofeng, Fu, Jie

arXiv.org Artificial IntelligenceMay-21-2024

The Mixture of Experts (MoE) for language models has been proven effective in augmenting the capacity of models by dynamically routing each input token to a specific subset of experts for processing. Despite the success, most existing methods face a challenge for balance between sparsity and the availability of expert knowledge: enhancing performance through increased use of expert knowledge often results in diminishing sparsity during expert selection. To mitigate this contradiction, we propose HyperMoE, a novel MoE framework built upon Hypernetworks. This framework integrates the computational processes of MoE with the concept of knowledge transferring in multi-task learning. Specific modules generated based on the information of unselected experts serve as supplementary information, which allows the knowledge of experts not selected to be used while maintaining selection sparsity. Our comprehensive empirical evaluations across multiple datasets and backbones establish that HyperMoE significantly outperforms existing MoE methods under identical conditions concerning the number of experts.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.12656

Country:

Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Beyond Language Models: Byte Models are Digital World Simulators

Wu, Shangda, Tan, Xu, Wang, Zili, Wang, Rui, Li, Xiaobing, Sun, Maosong

arXiv.org Artificial IntelligenceFeb-29-2024

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format. Inspired by the success of next token prediction in natural language processing, we introduce bGPT, a model with next byte prediction to simulate the digital world. bGPT matches specialized models in performance across various modalities, including text, audio, and images, and offers new possibilities for predicting, simulating, and diagnosing algorithm or hardware behaviour. It has almost flawlessly replicated the process of converting symbolic music data, achieving a low error rate of 0.0011 bits per byte in converting ABC notation to MIDI format. In addition, bGPT demonstrates exceptional capabilities in simulating CPU behaviour, with an accuracy exceeding 99.99% in executing various operations. Leveraging next byte prediction, models like bGPT can directly learn from vast binary data, effectively simulating the intricate patterns of the digital world.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.19155

Country:

Europe (1.00)
North America > United States > Louisiana (0.14)
Oceania > Australia > Queensland (0.14)
(3 more...)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)

Add feedback

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Yuan, Ruibin, Lin, Hanfeng, Wang, Yi, Tian, Zeyue, Wu, Shangda, Shen, Tianhao, Zhang, Ge, Wu, Yuhang, Liu, Cong, Zhou, Ziya, Ma, Ziyang, Xue, Liumeng, Wang, Ziyu, Liu, Qin, Zheng, Tianyu, Li, Yizhi, Ma, Yinghao, Liang, Yiming, Chi, Xiaowei, Liu, Ruibo, Wang, Zili, Li, Pengfei, Wu, Jingcheng, Lin, Chenghua, Liu, Qifeng, Jiang, Tao, Huang, Wenhao, Chen, Wenhu, Benetos, Emmanouil, Fu, Jie, Xia, Gus, Dannenberg, Roger, Xue, Wei, Kang, Shiyin, Guo, Yike

arXiv.org Artificial IntelligenceFeb-25-2024

While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.16153

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evolving Large Language Model Assistant with Long-Term Conditional Memory

Yuan, Ruifeng, Sun, Shichao, Wang, Zili, Cao, Ziqiang, Li, Wenjie

arXiv.org Artificial IntelligenceDec-21-2023

With the rapid development of large language models, AI assistants like ChatGPT have widely entered people's works and lives. In this paper, we present an evolving large language model assistant that utilizes verbal long-term memory. It focuses on preserving the knowledge and experience from the history dialogue between the user and AI assistant, which can be applied to future dialogue for generating a better response. The model generates a set of records for each finished dialogue and stores them in the memory. In later usage, given a new user input, the model uses it to retrieve its related memory to improve the quality of the response. To find the best form of memory, we explore different ways of constructing the memory and propose a new memorizing mechanism called conditional memory to solve the problems in previous methods. We also investigate the retrieval and usage of memory in the generation process. The assistant uses GPT-4 as the backbone and we evaluate it on three constructed test datasets focusing on different abilities required by an AI assistant with long-term memory.

information, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2312.17257

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.94)
Leisure & Entertainment > Sports (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

Hong, Sirui, Zhuge, Mingchen, Chen, Jonathan, Zheng, Xiawu, Cheng, Yuheng, Zhang, Ceyao, Wang, Jinlin, Wang, Zili, Yau, Steven Ka Shing, Lin, Zijuan, Zhou, Liyang, Ran, Chenyu, Xiao, Lingfeng, Wu, Chenglin, Schmidhuber, Jürgen

arXiv.org Artificial IntelligenceNov-6-2023

Remarkable progress has been made on automated problem solving through societies of agents based on large language models (LLMs). Existing LLM-based multi-agent systems can already solve simple dialogue tasks. Solutions to more complex tasks, however, are complicated through logic inconsistencies due to cascading hallucinations caused by naively chaining LLMs. Here we introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. MetaGPT encodes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, thus allowing agents with human-like domain expertise to verify intermediate results and reduce errors. MetaGPT utilizes an assembly line paradigm to assign diverse roles to various agents, efficiently breaking down complex tasks into subtasks involving many agents working together. On collaborative software engineering benchmarks, MetaGPT generates more coherent solutions than previous chat-based multi-agent systems. Our project can be found at https://github.com/geekan/MetaGPT

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.00352

Country:

Europe (0.46)
North America > United States (0.28)
Asia > China (0.28)

Genre: Workflow (0.88)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

RefGPT: Dialogue Generation of GPT, by GPT, and for GPT

Yang, Dongjie, Yuan, Ruifeng, Fan, Yuantao, Yang, Yifei, Wang, Zili, Wang, Shusen, Zhao, Hai

arXiv.org Artificial IntelligenceOct-18-2023

Large Language Models (LLMs) have attained the impressive capability to resolve a wide range of NLP tasks by fine-tuning high-quality instruction data. However, collecting human-written data of high quality, especially multi-turn dialogues, is expensive and unattainable for most people. Though previous studies have used powerful LLMs to generate the dialogues automatically, they all suffer from generating untruthful dialogues because of the model hallucination. Therefore, we propose a method called RefGPT to generate enormous truthful and customized dialogues without worrying about factual errors caused by the model hallucination. RefGPT solves the model hallucination in dialogue generation by restricting the LLMs to leverage the given reference instead of reciting their own knowledge to generate dialogues. Additionally, RefGPT adds detailed controls on every utterance to enable high customization capability, which previous studies have ignored. On the basis of RefGPT, we also propose two high-quality dialogue datasets generated by GPT-4, namely RefGPT-Fact and RefGPT-Code. RefGPT-Fact is a dataset with 100k multi-turn dialogues based on factual knowledge and RefGPT-Code has 76k multi-turn dialogues covering a wide range of coding scenarios. Our code and datasets are released in https://github.com/mutonix/RefGPT.

deep learning, gpt, machine learning, (5 more...)

arXiv.org Artificial Intelligence

2305.14994

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Go Beyond The Obvious: Probing the gap of INFORMAL reasoning ability between Humanity and LLMs by Detective Reasoning Puzzle Benchmark

Gu, Zhouhon, Li, Zihan, Zhang, Lin, Xiong, Zhuozhi, Ye, Haoning, Zhang, Yikai, Huang, Wenhao, Zhu, Xiaoxuan, He, Qianyu, Xu, Rui, Jiang, Sihang, Wang, Shusen, Wang, Zili, Feng, Hongwei, Li, Zhixu, Xiao, Yanghua

arXiv.org Artificial IntelligenceAug-9-2023

Informal reasoning ability is the ability to reason based on common sense, experience, and intuition.Humans use informal reasoning every day to extract the most influential elements for their decision-making from a large amount of life-like information.With the rapid development of language models, the realization of general artificial intelligence has emerged with hope. Given the outstanding informal reasoning ability of humans, how much informal reasoning ability language models have has not been well studied by scholars.In order to explore the gap between humans and language models in informal reasoning ability, this paper constructs a Detective Reasoning Benchmark, which is an assembly of 1,200 questions gathered from accessible online resources, aims at evaluating the model's informal reasoning ability in real-life context.Considering the improvement of the model's informal reasoning ability restricted by the lack of benchmark, we further propose a Self-Question Prompt Framework that mimics human thinking to enhance the model's informal reasoning ability.The goals of self-question are to find key elements, deeply investigate the connections between these elements, encourage the relationship between each element and the problem, and finally, require the model to reasonably answer the problem.The experimental results show that human performance greatly outperforms the SoTA Language Models in Detective Reasoning Benchmark.Besides, Self-Question is proven to be the most effective prompt engineering in improving GPT-4's informal reasoning ability, but it still does not even surpass the lowest score made by human participants.Upon acceptance of the paper, the source code for the benchmark will be made publicly accessible.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.05113

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Gu, Zhouhong, Zhu, Xiaoxuan, Ye, Haoning, Zhang, Lin, Wang, Jianchen, Jiang, Sihang, Xiong, Zhuozhi, Li, Zihan, He, Qianyu, Xu, Rui, Huang, Wenhao, Wang, Zili, Wang, Shusen, Zheng, Weiguo, Feng, Hongwei, Xiao, Yanghua

arXiv.org Artificial IntelligenceJun-15-2023

New Natural Langauge Process (NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge. Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi-Specialty and Xiezhi-Interdiscipline, both with 15k questions. We conduct evaluation of the 47 cutting-edge LLMs on Xiezhi. Results indicate that LLMs exceed average performance of humans in science, engineering, agronomy, medicine, and art, but fall short in economics, jurisprudence, pedagogy, literature, history, and management.

huggingface, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.05783

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Materials (1.00)
Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks

Beguš, Gašper, Lu, Thomas, Wang, Zili

arXiv.org Artificial IntelligenceMay-2-2023

Computational models of syntax are predominantly text-based. Here we propose that basic syntax can be modeled directly from raw speech in a fully unsupervised way. We focus on one of the most ubiquitous and basic properties of syntax -- concatenation. We introduce spontaneous concatenation: a phenomenon where convolutional neural networks (CNNs) trained on acoustic recordings of individual words start generating outputs with two or even three words concatenated without ever accessing data with multiple words in the input. Additionally, networks trained on two words learn to embed words into novel unobserved word combinations. To our knowledge, this is a previously unreported property of CNNs trained on raw speech in the Generative Adversarial Network setting and has implications both for our understanding of how these architectures learn as well as for modeling syntax and its evolution from raw acoustic inputs.

artificial intelligence, experiment, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2305.01626

Country: North America > United States (0.68)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback