kuaishou
BlendGAN: ImplicitlyGANBlendingforArbitrary StylizedFaceGeneration SupplementaryMaterials
For the generator and the three discriminators, we use the FFHQ [2] and AAHQ datasets with 1024 1024 resolution. Hence, cooperating withGAN inversion methods, our framework is able to achieve arbitrary style transfer of a given face image. Wheni=0,allthelayersofthegenerator areinfluenced bythestylelatentcode. Result images of the directly concatenating method have similar face identities and head poses to their reference images, which means that this method leaks content information ofreference images to stylelatentcodes. However, for a reference image whose style is significantly different from that inAAHQ, ifdirectly feeding itinto BlendGAN, the style ofgenerated images maynotbesimilartothereference.
- South America > Peru > Puno Department (0.05)
- South America > Peru > Madre de Dios Department (0.05)
- South America > Peru > Cusco Department (0.05)
- Asia > Middle East > Oman (0.05)
QARM: Quantitative Alignment Multi-Modal Recommendation at Kuaishou
Luo, Xinchen, Cao, Jiangxia, Sun, Tianyu, Yu, Jinkai, Huang, Rui, Yuan, Wei, Lin, Hezheng, Zheng, Yichen, Wang, Shiyao, Hu, Qigen, Qiu, Changqing, Zhang, Jiaqi, Zhang, Xu, Yan, Zhiheng, Zhang, Jingming, Zhang, Simin, Wen, Mingxing, Liu, Zhaojie, Gai, Kun, Zhou, Guorui
In recent years, with the significant evolution of multi-modal large models, many recommender researchers realized the potential of multi-modal information for user interest modeling. In industry, a wide-used modeling architecture is a cascading paradigm: (1) first pre-training a multi-modal model to provide omnipotent representations for downstream services; (2) The downstream recommendation model takes the multi-modal representation as additional input to fit real user-item behaviours. Although such paradigm achieves remarkable improvements, however, there still exist two problems that limit model performance: (1) Representation Unmatching: The pre-trained multi-modal model is always supervised by the classic NLP/CV tasks, while the recommendation models are supervised by real user-item interaction. As a result, the two fundamentally different tasks' goals were relatively separate, and there was a lack of consistent objective on their representations; (2) Representation Unlearning: The generated multi-modal representations are always stored in cache store and serve as extra fixed input of recommendation model, thus could not be updated by recommendation model gradient, further unfriendly for downstream training. Inspired by the two difficulties challenges in downstream tasks usage, we introduce a quantitative multi-modal framework to customize the specialized and trainable multi-modal information for different downstream models.
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Communications > Social Media (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
KuaiFormer: Transformer-Based Retrieval at Kuaishou
Liu, Chi, Cao, Jiangxia, Huang, Rui, Zheng, Kai, Luo, Qiang, Gai, Kun, Zhou, Guorui
In large-scale content recommendation systems, retrieval serves as the initial stage in the pipeline, responsible for selecting thousands of candidate items from billions of options to pass on to ranking modules. Traditionally, the dominant retrieval method has been Embedding-Based Retrieval (EBR) using a Deep Neural Network (DNN) dual-tower structure. However, applying transformer in retrieval tasks has been the focus of recent research, though real-world industrial deployment still presents significant challenges. In this paper, we introduce KuaiFormer, a novel transformer-based retrieval framework deployed in a large-scale content recommendation system. KuaiFormer fundamentally redefines the retrieval process by shifting from conventional score estimation tasks (such as click-through rate estimate) to a transformer-driven Next Action Prediction paradigm. This shift enables more effective real-time interest acquisition and multi-interest extraction, significantly enhancing retrieval performance. KuaiFormer has been successfully integrated into Kuaishou App's short-video recommendation system since May 2024, serving over 400 million daily active users and resulting in a marked increase in average daily usage time of Kuaishou users. We provide insights into both the technical and business aspects of deploying transformer in large-scale recommendation systems, addressing practical challenges encountered during industrial implementation. Our findings offer valuable guidance for engineers and researchers aiming to leverage transformer models to optimize large-scale content recommendation systems.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Beijing > Beijing (0.05)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou
Wang, Xu, Cao, Jiangxia, Fu, Zhiyi, Gai, Kun, Zhou, Guorui
In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we still observe three anomalies that seriously affect model performances in our iteration: (1) Expert Collapse: We found that experts' output distributions are significantly different, and some experts have over 90% zero activations with ReLU, making it hard for gate networks to assign fair weights to balance experts. (2) Expert Degradation: Ideally, the shared-expert aims to provide predictive information for all tasks simultaneously. Nevertheless, we find that some shared-experts are occupied by only one task, which indicates that shared-experts lost their ability but degenerated into some specific-experts. (3) Expert Underfitting: In our services, we have dozens of behavior tasks that need to be predicted, but we find that some data-sparse prediction tasks tend to ignore their specific-experts and assign large weights to shared-experts. The reason might be that the shared-experts can perceive more gradient updates and knowledge from dense tasks, while specific-experts easily fall into underfitting due to their sparse behaviors. Motivated by those observations, we propose HoME to achieve a simple, efficient and balanced MoE system for multi-task learning.
I tested out a buzzy new text-to-video AI model from China
The short-video platform, which has over 600 million active users, announced the new tool on June 6. Like OpenAI's Sora model, Kling is able to generate videos "up to two minutes long with a frame rate of 30fps and video resolution up to 1080p," the company says on its website. But unlike Sora, which still remains inaccessible to the public four months after OpenAI trialed it, Kling soon started letting people try the model themselves. I got access to it after downloading Kuaishou's video-editing tool, signing up with a Chinese number, getting on a waitlist, and filling out an additional form through Kuaishou's user feedback groups. The model can't process prompts written entirely in English, but you can get around that by either translating the phrase you want to use into Chinese or including one or two Chinese words.
Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou
Li, Kunpeng, Shao, Guangcui, Yang, Naijun, Fang, Xiao, Song, Yang
Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. It is widely used in a variety of business scenarios to make operational decisions when acquiring new customers. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. Existing approaches either directly learn from posterior feature distributions or leverage statistical models that make strong assumption on prior distributions, both of which fail to capture those mutable distributions. In this paper, we propose a complete set of industrial-level LTV modeling solutions. Specifically, we introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans, which greatly improves model performance. We further introduce a Multi Distribution Multi Experts (MDME) module based on the Divide-and-Conquer idea, which transforms the severely imbalanced distribution modeling problem into a series of relatively balanced sub-distribution modeling problems hence greatly reduces the modeling complexity. In addition, a novel evaluation metric Mutual Gini is introduced to better measure the distribution difference between the estimated value and the ground-truth label based on the Lorenz Curve. The ODMN framework has been successfully deployed in many business scenarios of Kuaishou, and achieved great performance. Extensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to state-of-the-art baselines including ZILN and Two-Stage XGBoost models.
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Jiangsu Province > Yancheng (0.04)
- Africa (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Data Science > Data Mining (0.93)
How AI (and Mushrooms) Are Helping Fight Poverty in China's Most Remote Villages
The last thing on Geru Drolma's mind was becoming an internet celebrity. All she wanted was to make rent. But the steamed buns Drolma rose at 5 a.m. each morning to make in her village in western China's Sichuan province just weren't selling fast enough. So with the bills mounting up, Drolma set off to hunt for wild fungi she hoped to sell at the local market, following the same azalea-strewn mountain paths carved by generations of her fellow ethnic Tibetans before her. Finding the best fungi varieties--like the sought-after matsutake, or pine mushroom--is not easy.