AITopics | Cheng, Weiyu

Collaborating Authors

Cheng, Weiyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax, null, Li, Aonian, Gong, Bangwei, Yang, Bo, Shan, Boji, Liu, Chang, Zhu, Cheng, Zhang, Chunhao, Guo, Congchao, Chen, Da, Li, Dong, Jiao, Enwei, Li, Gengxin, Zhang, Guojun, Sun, Haohai, Dong, Houze, Zhu, Jiadai, Zhuang, Jiaqi, Song, Jiayuan, Zhu, Jin, Han, Jingtao, Li, Jingyang, Xie, Junbin, Xu, Junhao, Yan, Junjie, Zhang, Kaishun, Xiao, Kecheng, Kang, Kexi, Han, Le, Wang, Leyang, Yu, Lianfei, Feng, Liheng, Zheng, Lin, Chai, Linbo, Xing, Long, Ju, Meizhi, Chi, Mingyuan, Zhang, Mozhi, Huang, Peikai, Niu, Pengcheng, Li, Pengfei, Zhao, Pengyu, Yang, Qi, Xu, Qidi, Wang, Qiexiang, Wang, Qin, Li, Qiuhui, Leng, Ruitao, Shi, Shengmin, Yu, Shuqi, Li, Sichen, Zhu, Songquan, Huang, Tao, Liang, Tianrun, Sun, Weigao, Sun, Weixuan, Cheng, Weiyu, Li, Wenkai, Song, Xiangjun, Su, Xiao, Han, Xiaodong, Zhang, Xinjie, Hou, Xinzhu, Min, Xu, Zou, Xun, Shen, Xuyang, Gong, Yan, Zhu, Yingjie, Zhou, Yipeng, Zhong, Yiran, Hu, Yongyi, Fan, Yuanxiang, Yu, Yue, Yang, Yufeng, Li, Yuhao, Huang, Yunan, Li, Yunji, Huang, Yunpeng, Xu, Yunzhi, Mao, Yuxin, Li, Zehan, Li, Zekang, Tao, Zewei, Ying, Zewen, Cong, Zhaoyang, Qin, Zhen, Fan, Zhenhua, Yu, Zhihang, Jiang, Zhuo, Wu, Zijia

arXiv.org Artificial IntelligenceJan-14-2025

We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at an affordable cost. Our vision-language model, MiniMax-VL-01 is built through continued training with 512 billion vision-language tokens. Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window. We publicly release MiniMax-01 at https://github.com/MiniMax-AI.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.08313

Country:

Asia (0.92)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RESUS: Warm-Up Cold Users via Meta-Learning Residual User Preferences in CTR Prediction

Shen, Yanyan, Zhao, Lifan, Cheng, Weiyu, Zhang, Zibin, Zhou, Wenwen, Lin, Kangyi

arXiv.org Artificial IntelligenceOct-28-2022

Click-Through Rate (CTR) prediction on cold users is a challenging task in recommender systems. Recent researches have resorted to meta-learning to tackle the cold-user challenge, which either perform few-shot user representation learning or adopt optimization-based meta-learning. However, existing methods suffer from information loss or inefficient optimization process, and they fail to explicitly model global user preference knowledge which is crucial to complement the sparse and insufficient preference information of cold users. In this paper, we propose a novel and efficient approach named RESUS, which decouples the learning of global preference knowledge contributed by collective users from the learning of residual preferences for individual users. Specifically, we employ a shared predictor to infer basis user preferences, which acquires global preference knowledge from the interactions of different users. Meanwhile, we develop two efficient algorithms based on the nearest neighbor and ridge regression predictors, which infer residual user preferences via learning quickly from a few user-specific interactions. Extensive experiments on three public datasets demonstrate that our RESUS approach is efficient and effective in improving CTR prediction accuracy on cold users, compared with various state-of-the-art methods.

artificial intelligence, machine learning, resus, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3564283

2210.1608

Country: Asia (0.29)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions

Cheng, Weiyu, Shen, Yanyan, Huang, Linpeng

arXiv.org Artificial IntelligenceSep-7-2019

V arious factorization-based methods have been proposed to leverage second-order, or higher-order cross features for boosting the performance of predictive models. They generally enumerate all the cross features under a predefined maximum order, and then identify useful feature interactions through model training, which suffer from two drawbacks. First, they have to make a tradeoff between the expressiveness of higher-order cross features and the computational cost, resulting in suboptimal predictions. Second, enumerating all the cross features, including irrelevant ones, may introduce noisy feature combinations that degrade model performance. In this work, we propose the Adaptive Factorization Network (AFN), a new model that learns arbitrary-order cross features adaptively from data. The core of AFN is a logarithmic transformation layer to convert the power of each feature in a feature combination into the coefficient to be learned. The experimental results on four real datasets demonstrate the superior predictive performance of AFN against the start-of-the-arts. 1 Introduction Feature engineering is typically recognized as central to successful machine learning tasks, such as recommender systems (Lian et al. 2017), computational advertising (He et al. 2014) and search ranking (Lian and Xie 2016). Except for exploiting raw features, it is usually crucial to find effective transformations of raw features to boost the performance of predictive models. Cross features are a major type of feature transformations, where multiplication is performed over sparse raw features to form new features (Cheng et al. 2016). However, handcrafting useful cross features is inevitably expensive and time-consuming, and the results may not generalize to unseen feature interactions.

artificial intelligence, cross feature, neural network, (17 more...)

arXiv.org Artificial Intelligence

1909.03276

Country:

Europe (0.46)
North America > United States (0.29)
Oceania > Australia (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Industry: Marketing (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Explaining Latent Factor Models for Recommendation with Influence Functions

Cheng, Weiyu, Shen, Yanyan, Zhu, Yanmin, Huang, Linpeng

arXiv.org Artificial IntelligenceNov-20-2018

Latent factor models (LFMs) such as matrix factorization achieve the state-of-the-art performance among various Collaborative Filtering (CF) approaches for recommendation. Despite the high recommendation accuracy of LFMs, a critical issue to be resolved is the lack of explainability. Extensive efforts have been made in the literature to incorporate explainability into LFMs. However, they either rely on auxiliary information which may not be available in practice, or fail to provide easy-to-understand explanations. In this paper, we propose a fast influence analysis method named FIA, which successfully enforces explicit neighbor-style explanations to LFMs with the technique of influence functions stemmed from robust statistics. We first describe how to employ influence functions to LFMs to deliver neighbor-style explanations. Then we develop a novel influence computation algorithm for matrix factorization with high efficiency. We further extend it to the more general neural collaborative filtering and introduce an approximation algorithm to accelerate influence analysis over neural network models. Experimental results on real datasets demonstrate the correctness, efficiency and usefulness of our proposed method.

artificial intelligence, explanation, neural network, (19 more...)

arXiv.org Artificial Intelligence

1811.0812

Country:

North America > United States (0.68)
Europe (0.68)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Neural Attention Model for Urban Air Quality Inference: Learning the Weights of Monitoring Stations

Cheng, Weiyu (Shanghai Jiao Tong University) | Shen, Yanyan (Shanghai Jiao Tong University) | Zhu, Yanmin (Shanghai Jiao Tong University) | Huang, Linpeng (Shanghai Jiao Tong University)

AAAI ConferencesFeb-8-2018

Urban air pollution has attracted much attention these years for its adverse impacts on human health. While monitoring stations have been established to collect pollutant statistics, the number of stations is very limited due to the high cost. Thus, inferring fine-grained urban air quality information is becoming an essential issue for both government and people. In this paper, we propose a generic neural approach, named ADAIN, for urban air quality inference. We leverage both the information from monitoring stations and urban data that are closely related to air quality, including POIs, road networks and meteorology. ADAIN combines feedforward and recurrent neural networks for modeling static and sequential features as well as capturing deep feature interactions effectively. A novel attempt of ADAIN is an attention-based pooling layer that automatically learns the weights of features from different monitoring stations, to boost the performance. We conduct experiments on a real-world air quality dataset and our approach achieves the highest performance compared with various state-of-the-art solutions.

air quality, deep learning, neural network, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.29)

Genre: Research Report (0.48)

Industry:

Transportation > Infrastructure & Services (0.36)
Transportation > Ground > Road (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback