AITopics

Technology:

Information Technology > Artificial Intelligence (0.95)
Information Technology > Sensing and Signal Processing > Image Processing (0.80)

Neural Information Processing SystemsFeb-8-2026, 22:36:56 GMT

5edc4f7dce28c711afc6265b4f99bf57-Supplemental.pdf

accuracy, computer vision, lsh attention, (14 more...)

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Vision (0.76)

Neural Information Processing SystemsAug-18-2025, 23:53:31 GMT

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation Supplementary Materials

Sec. 3.1 of the main text, the directly concatenating method leads to a 610,304-dimensional vector, The above process can be regarded as one-shot learning. BlendGAN are inconsistent with the reference styles. Progressive growing of gans for improved quality, stability, and variation.

artificial intelligence, machine learning, reference image, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-14-2025, 18:37:15 GMT

5edc4f7dce28c711afc6265b4f99bf57-Supplemental.pdf

accuracy, computer vision, lsh attention, (14 more...)

Technology: Information Technology > Artificial Intelligence > Vision (0.76)

arXiv.org Artificial IntelligenceNov-18-2024

QARM: Quantitative Alignment Multi-Modal Recommendation at Kuaishou

Luo, Xinchen, Cao, Jiangxia, Sun, Tianyu, Yu, Jinkai, Huang, Rui, Yuan, Wei, Lin, Hezheng, Zheng, Yichen, Wang, Shiyao, Hu, Qigen, Qiu, Changqing, Zhang, Jiaqi, Zhang, Xu, Yan, Zhiheng, Zhang, Jingming, Zhang, Simin, Wen, Mingxing, Liu, Zhaojie, Gai, Kun, Zhou, Guorui

In recent years, with the significant evolution of multi-modal large models, many recommender researchers realized the potential of multi-modal information for user interest modeling. In industry, a wide-used modeling architecture is a cascading paradigm: (1) first pre-training a multi-modal model to provide omnipotent representations for downstream services; (2) The downstream recommendation model takes the multi-modal representation as additional input to fit real user-item behaviours. Although such paradigm achieves remarkable improvements, however, there still exist two problems that limit model performance: (1) Representation Unmatching: The pre-trained multi-modal model is always supervised by the classic NLP/CV tasks, while the recommendation models are supervised by real user-item interaction. As a result, the two fundamentally different tasks' goals were relatively separate, and there was a lack of consistent objective on their representations; (2) Representation Unlearning: The generated multi-modal representations are always stored in cache store and serve as extra fixed input of recommendation model, thus could not be updated by recommendation model gradient, further unfriendly for downstream training. Inspired by the two difficulties challenges in downstream tasks usage, we introduce a quantitative multi-modal framework to customize the specialized and trainable multi-modal information for different downstream models.

artificial intelligence, machine learning, natural language, (20 more...)

2411.11739

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-15-2024

KuaiFormer: Transformer-Based Retrieval at Kuaishou

Liu, Chi, Cao, Jiangxia, Huang, Rui, Zheng, Kai, Luo, Qiang, Gai, Kun, Zhou, Guorui

In large-scale content recommendation systems, retrieval serves as the initial stage in the pipeline, responsible for selecting thousands of candidate items from billions of options to pass on to ranking modules. Traditionally, the dominant retrieval method has been Embedding-Based Retrieval (EBR) using a Deep Neural Network (DNN) dual-tower structure. However, applying transformer in retrieval tasks has been the focus of recent research, though real-world industrial deployment still presents significant challenges. In this paper, we introduce KuaiFormer, a novel transformer-based retrieval framework deployed in a large-scale content recommendation system. KuaiFormer fundamentally redefines the retrieval process by shifting from conventional score estimation tasks (such as click-through rate estimate) to a transformer-driven Next Action Prediction paradigm. This shift enables more effective real-time interest acquisition and multi-interest extraction, significantly enhancing retrieval performance. KuaiFormer has been successfully integrated into Kuaishou App's short-video recommendation system since May 2024, serving over 400 million daily active users and resulting in a marked increase in average daily usage time of Kuaishou users. We provide insights into both the technical and business aspects of deploying transformer in large-scale recommendation systems, addressing practical challenges encountered during industrial implementation. Our findings offer valuable guidance for engineers and researchers aiming to leverage transformer models to optimize large-scale content recommendation systems.

large language model, machine learning, natural language, (21 more...)

2411.10057

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China > Beijing > Beijing (0.05)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-10-2024

HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou

Wang, Xu, Cao, Jiangxia, Fu, Zhiyi, Gai, Kun, Zhou, Guorui

In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we still observe three anomalies that seriously affect model performances in our iteration: (1) Expert Collapse: We found that experts' output distributions are significantly different, and some experts have over 90% zero activations with ReLU, making it hard for gate networks to assign fair weights to balance experts. (2) Expert Degradation: Ideally, the shared-expert aims to provide predictive information for all tasks simultaneously. Nevertheless, we find that some shared-experts are occupied by only one task, which indicates that shared-experts lost their ability but degenerated into some specific-experts. (3) Expert Underfitting: In our services, we have dozens of behavior tasks that need to be predicted, but we find that some data-sparse prediction tasks tend to ignore their specific-experts and assign large weights to shared-experts. The reason might be that the shared-experts can perceive more gradient updates and knowledge from dense tasks, while specific-experts easily fall into underfitting due to their sparse behaviors. Motivated by those observations, we propose HoME to achieve a simple, efficient and balanced MoE system for multi-task learning.

hierarchy, multi-gate expert, multi-task learning, (1 more...)

2408.0543

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

MIT Technology ReviewJun-19-2024, 09:00:00 GMT

I tested out a buzzy new text-to-video AI model from China

The short-video platform, which has over 600 million active users, announced the new tool on June 6. Like OpenAI's Sora model, Kling is able to generate videos "up to two minutes long with a frame rate of 30fps and video resolution up to 1080p," the company says on its website. But unlike Sora, which still remains inaccessible to the public four months after OpenAI trialed it, Kling soon started letting people try the model themselves. I got access to it after downloading Kuaishou's video-editing tool, signing up with a Chinese number, getting on a waitlist, and filling out an additional form through Kuaishou's user feedback groups. The model can't process prompts written entirely in English, but you can get around that by either translating the phrase you want to use into Chinese or including one or two Chinese words.

buzzy new text-to-video ai model, china, kling, (3 more...)

MIT Technology Review

Country:

Asia > China (0.40)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.07)

Genre: Summary/Review (0.40)

Technology:

Information Technology > Communications > Social Media (0.85)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.64)

arXiv.org Artificial IntelligenceAug-29-2022

Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou

Li, Kunpeng, Shao, Guangcui, Yang, Naijun, Fang, Xiao, Song, Yang

Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. It is widely used in a variety of business scenarios to make operational decisions when acquiring new customers. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. Existing approaches either directly learn from posterior feature distributions or leverage statistical models that make strong assumption on prior distributions, both of which fail to capture those mutable distributions. In this paper, we propose a complete set of industrial-level LTV modeling solutions. Specifically, we introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans, which greatly improves model performance. We further introduce a Multi Distribution Multi Experts (MDME) module based on the Divide-and-Conquer idea, which transforms the severely imbalanced distribution modeling problem into a series of relatively balanced sub-distribution modeling problems hence greatly reduces the modeling complexity. In addition, a novel evaluation metric Mutual Gini is introduced to better measure the distribution difference between the estimated value and the ground-truth label based on the Lorenz Curve. The ODMN framework has been successfully deployed in many business scenarios of Kuaishou, and achieved great performance. Extensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to state-of-the-art baselines including ZILN and Two-Stage XGBoost models.

ltv, module, prediction, (10 more...)

doi: 10.1145/3511808.3557152

2208.13358

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.05)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Jiangsu Province > Yancheng (0.04)
Africa (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)

TIME - TechJan-10-2020, 18:07:41 GMT

How AI (and Mushrooms) Are Helping Fight Poverty in China's Most Remote Villages

The last thing on Geru Drolma's mind was becoming an internet celebrity. All she wanted was to make rent. But the steamed buns Drolma rose at 5 a.m. each morning to make in her village in western China's Sichuan province just weren't selling fast enough. So with the bills mounting up, Drolma set off to hunt for wild fungi she hoped to sell at the local market, following the same azalea-strewn mountain paths carved by generations of her fellow ethnic Tibetans before her. Finding the best fungi varieties--like the sought-after matsutake, or pine mushroom--is not easy.

china, drolma, video, (8 more...)

TIME - Tech

Country:

Asia > China > Sichuan Province (0.25)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Asia > China > Shanghai > Shanghai (0.05)

Technology: Information Technology > Artificial Intelligence (1.00)