AITopics | Li, Biye

Collaborating Authors

Li, Biye

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

V2Flow: Unifying Visual Tokenization and Large Language Model Vocabularies for Autoregressive Image Generation

Zhang, Guiwei, Zhang, Tianyu, Zhou, Mohan, Bai, Yalong, Li, Biye

arXiv.org Artificial IntelligenceMar-10-2025

We propose V2Flow, a novel tokenizer that produces discrete visual tokens capable of high-fidelity reconstruction, while ensuring structural and latent distribution alignment with the vocabulary space of large language models (LLMs). Leveraging this tight visual-vocabulary coupling, V2Flow enables autoregressive visual generation on top of existing LLMs. Our approach formulates visual tokenization as a flow-matching problem, aiming to learn a mapping from a standard normal prior to the continuous image distribution, conditioned on token sequences embedded within the LLMs vocabulary space. The effectiveness of V2Flow stems from two core designs. First, we propose a Visual Vocabulary resampler, which compresses visual data into compact token sequences, with each represented as a soft categorical distribution over LLM's vocabulary. This allows seamless integration of visual tokens into existing LLMs for autoregressive visual generation. Second, we present a masked autoregressive Rectified-Flow decoder, employing a masked transformer encoder-decoder to refine visual tokens into contextually enriched embeddings. These embeddings then condition a dedicated velocity field for precise reconstruction. Additionally, an autoregressive rectified-flow sampling strategy is incorporated, ensuring flexible sequence lengths while preserving competitive reconstruction quality. Extensive experiments show that V2Flow outperforms mainstream VQ-based tokenizers and facilitates autoregressive visual generation on top of existing. https://github.com/zhangguiwei610/V2Flow

arxiv preprint arxiv, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.07493

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging

Wang, Zhixiang, Mao, Zhenyu, Qiao, Yixuan, Wu, Yunfang, Li, Biye

arXiv.org Artificial IntelligenceFeb-17-2025

Large Language Models (LLMs) have demonstrated impressive capabilities, but their high computational costs pose challenges for customization. Model merging offers a costeffective alternative, yet existing methods suffer from interference among parameters, leading to performance degradation. In this work, we propose Optimal Brain Iterative Merging (OBIM), a novel method designed to mitigate both intra-model and inter-model interference. OBIM consists of two key components: Figure 1: Illustration of inter-model interference. The (1) A saliency measurement mechanism dotted box highlights cases where TIES fails to resolve that evaluates parameter importance based on interference. Approximately 46% of parameters deviate loss changes induced by individual weight alterations, from the original models due to task vector averaging reducing intra-model interference by in the absence of sign conflicts.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.12217

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

Wei, Tianwen, Zhu, Bo, Zhao, Liang, Cheng, Cheng, Li, Biye, Lü, Weiwei, Cheng, Peng, Zhang, Jianhao, Zhang, Xiaoyu, Zeng, Liang, Wang, Xiaokun, Ma, Yutuan, Hu, Rui, Yan, Shuicheng, Fang, Han, Zhou, Yahui

arXiv.org Artificial IntelligenceJun-2-2024

In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initializations. Our findings suggest that the choice between these two approaches should consider both the performance of the existing dense checkpoints and the MoE training budget. We highlight two innovative techniques: gating logit normalization, which improves expert diversification, and adaptive auxiliary loss coefficients, allowing for layer-specific adjustment of auxiliary loss coefficients. Our experimental results validate the effectiveness of these methods. Leveraging these techniques and insights, we trained our upcycled Skywork-MoE on a condensed subset of our SkyPile corpus. The evaluation results demonstrate that our model delivers strong performance across a wide range of benchmarks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.06563

Country: North America (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Skywork: A More Open Bilingual Foundation Model

Wei, Tianwen, Zhao, Liang, Zhang, Lichang, Zhu, Bo, Wang, Lijie, Yang, Haihua, Li, Biye, Cheng, Cheng, Lü, Weiwei, Hu, Rui, Li, Chenxia, Yang, Liu, Luo, Xilin, Wu, Xuejie, Liu, Lunan, Cheng, Wenjun, Cheng, Peng, Zhang, Jianhao, Zhang, Xiaoyu, Lin, Lei, Wang, Xiaokun, Ma, Yutuan, Dong, Chuanhai, Sun, Yanqi, Chen, Yifu, Peng, Yongyi, Liang, Xiaojuan, Yan, Shuicheng, Fang, Han, Zhou, Yahui

arXiv.org Artificial IntelligenceOct-30-2023

In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs.

large language model, machine learning, skywork-13b, (22 more...)

arXiv.org Artificial Intelligence

2310.19341

Country:

Europe (1.00)
North America > United States > Minnesota (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SkyMath: Technical Report

Yang, Liu, Yang, Haihua, Cheng, Wenjun, Lin, Lei, Li, Chenxia, Chen, Yifu, Liu, Lunan, Pan, Jianfei, Wei, Tianwen, Li, Biye, Zhao, Liang, Wang, Lijie, Zhu, Bo, Li, Guoliang, Wu, Xuejie, Luo, Xilin, Hu, Rui

arXiv.org Artificial IntelligenceOct-26-2023

Large language models (LLMs) have shown great potential to solve varieties of natural language processing (NLP) tasks, including mathematical reasoning. By applying self-compare fine-tuning, we have enhanced mathematical reasoning abilities of Skywork-13B-Base remarkably. On GSM8K, SkyMath outperforms all known open-source models of similar size and has established a new SOTA performance. On dataset MATH and out-of-domain dataset CMath, SkyMath also achieves a high accuracy rate, showing remarkable generalizability to varieties of math problems. Moreover, compared to traditional AI methods, LLMs gain unparalleled advantages in these landscapes.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.16713

Country: North America > Canada (0.14)

Genre: Research Report (0.50)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback