AITopics | Liu, Shaowei

Collaborating Authors

Liu, Shaowei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Muon is Scalable for LLM Training

Liu, Jingyuan, Su, Jianlin, Yao, Xingcheng, Jiang, Zhejun, Lai, Guokun, Du, Yulun, Qin, Yidao, Xu, Weixin, Lu, Enzhe, Yan, Junjie, Chen, Yanru, Zheng, Huabin, Liu, Yibo, Liu, Shaowei, Yin, Bohong, He, Weiran, Zhu, Han, Wang, Yuzhi, Wang, Jianzhou, Dong, Mengnan, Zhang, Zheng, Kang, Yongsheng, Zhang, Hao, Xu, Xinran, Zhang, Yutao, Wu, Yuxin, Zhou, Xinyu, Yang, Zhilin

arXiv.org Artificial IntelligenceFeb-24-2025

Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scaling up Muon: (1) adding weight decay and (2) carefully adjusting the per-parameter update scale. These techniques allow Muon to work out-of-the-box on large-scale training without the need of hyper-parameter tuning. Scaling law experiments indicate that Muon achieves $\sim\!2\times$ computational efficiency compared to AdamW with compute optimal training. Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto frontier, achieving better performance with much fewer training FLOPs compared to prior models. We open-source our distributed Muon implementation that is memory optimal and communication efficient. We also release the pretrained, instruction-tuned, and intermediate checkpoints to support future research.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.16982

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

MoBA: Mixture of Block Attention for Long-Context LLMs

Lu, Enzhe, Jiang, Zhejun, Liu, Jingyuan, Du, Yulun, Jiang, Tao, Hong, Chao, Liu, Shaowei, He, Weiran, Yuan, Enming, Wang, Yuzhi, Huang, Zhiqi, Yuan, Huan, Xu, Suting, Xu, Xinran, Lai, Guokun, Chen, Yanru, Zheng, Huabin, Yan, Junjie, Su, Jianlin, Wu, Yuxin, Zhang, Neo Y., Yang, Zhilin, Zhou, Xinyu, Zhang, Mingxing, Qiu, Jiezhong

arXiv.org Artificial IntelligenceFeb-18-2025

Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechanisms presents a prohibitive overhead. Existing approaches either impose strongly biased structures, such as sink or window attention which are task-specific, or radically modify the attention mechanism into linear approximations, whose performance in complex reasoning tasks remains inadequately explored. In this work, we propose a solution that adheres to the ``less structure'' principle, allowing the model to determine where to attend autonomously, rather than introducing predefined biases. We introduce Mixture of Block Attention (MoBA), an innovative approach that applies the principles of Mixture of Experts (MoE) to the attention mechanism. This novel architecture demonstrates superior performance on long-context tasks while offering a key advantage: the ability to seamlessly transition between full and sparse attention, enhancing efficiency without the risk of compromising performance. MoBA has already been deployed to support Kimi's long-context requests and demonstrates significant advancements in efficient attention computation for LLMs. Our code is available at https://github.com/MoonshotAI/MoBA.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.13189

Genre: Research Report > Promising Solution (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Kimi Team, null, Du, Angang, Gao, Bofei, Xing, Bowei, Jiang, Changjiu, Chen, Cheng, Li, Cheng, Xiao, Chenjun, Du, Chenzhuang, Liao, Chonghua, Tang, Chuning, Wang, Congcong, Zhang, Dehao, Yuan, Enming, Lu, Enzhe, Tang, Fengxiang, Sung, Flood, Wei, Guangda, Lai, Guokun, Guo, Haiqing, Zhu, Han, Ding, Hao, Hu, Hao, Yang, Hao, Zhang, Hao, Yao, Haotian, Zhao, Haotian, Lu, Haoyu, Li, Haoze, Yu, Haozhen, Gao, Hongcheng, Zheng, Huabin, Yuan, Huan, Chen, Jia, Guo, Jianhang, Su, Jianlin, Wang, Jianzhou, Zhao, Jie, Zhang, Jin, Liu, Jingyuan, Yan, Junjie, Wu, Junyan, Shi, Lidong, Ye, Ling, Yu, Longhui, Dong, Mengnan, Zhang, Neo, Ma, Ningchen, Pan, Qiwei, Gong, Qucheng, Liu, Shaowei, Ma, Shengling, Wei, Shupeng, Cao, Sihan, Huang, Siying, Jiang, Tao, Gao, Weihao, Xiong, Weimin, He, Weiran, Huang, Weixiao, Wu, Wenhao, He, Wenyang, Wei, Xianghui, Jia, Xianqing, Wu, Xingzhe, Xu, Xinran, Zu, Xinxing, Zhou, Xinyu, Pan, Xuehai, Charles, Y., Li, Yang, Hu, Yangyang, Liu, Yangyang, Chen, Yanru, Wang, Yejie, Liu, Yibo, Qin, Yidao, Liu, Yifeng, Yang, Ying, Bao, Yiping, Du, Yulun, Wu, Yuxin, Wang, Yuzhi, Zhou, Zaida, Wang, Zhaoji, Li, Zhaowei, Zhu, Zhen, Zhang, Zheng, Wang, Zhexu, Yang, Zhilin, Huang, Zhiqi, Huang, Zihao, Xu, Ziyao, Yang, Zonghan

arXiv.org Artificial IntelligenceJan-21-2025

Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value functions, and process reward models. Notably, our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities -- e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista -- matching OpenAI's o1. Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3.5 by a large margin (up to +550%).

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.12599

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games (0.46)
Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

Liu, Shaowei, Ren, Zhongzheng, Gupta, Saurabh, Wang, Shenlong

arXiv.org Artificial IntelligenceSep-27-2024

We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e.g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video. Our key insight is to integrate model-based physical simulation with a data-driven video generation process, enabling plausible image-space dynamics. At the heart of our system are three core components: (i) an image understanding module that effectively captures the geometry, materials, and physical parameters of the image; (ii) an image-space dynamics simulation model that utilizes rigid-body physics and inferred parameters to simulate realistic behaviors; and (iii) an image-based rendering and refinement module that leverages generative video diffusion to produce realistic video footage featuring the simulated motion. The resulting videos are realistic in both physics and appearance and are even precisely controllable, showcasing superior results over existing data-driven image-to-video generation works through quantitative comparison and comprehensive user study. PhysGen's resulting videos can be used for various downstream applications, such as turning an image into a realistic animation or allowing users to interact with the image and create various dynamics. Project page: https://stevenlsw.github.io/physgen/

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.18964

Genre:

Questionnaire & Opinion Survey (0.68)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback