Not enough data to create a plot.
Try a different view from the menu above.
Jiang, Peng
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy
Yang, Te, Jia, Jian, Zhu, Xiangyu, Zhao, Weisong, Wang, Bo, Cheng, Yanhua, Li, Yan, Liu, Shengyuan, Chen, Quan, Jiang, Peng, Gai, Kun, Lei, Zhen
Large Language Models (LLMs) have strong instruction-following capability to interpret and execute tasks as directed by human commands. Multimodal Large Language Models (MLLMs) have inferior instruction-following ability compared to LLMs. However, there is a significant gap in the instruction-following capabilities between the MLLMs and LLMs. In this study, we conduct a pilot experiment, which demonstrates that spatially down-sampling visual tokens significantly enhances the instruction-following capability of MLLMs. This is attributed to the substantial redundancy in visual modality. However, this intuitive method severely impairs the MLLM's multimodal understanding capability. In this paper, we propose Visual-Modality Token Compression (VMTC) and Cross-Modality Attention Inhibition (CMAI) strategies to alleviate this gap between MLLMs and LLMs by inhibiting the influence of irrelevant visual tokens during content generation, increasing the instruction-following ability of the MLLMs while retaining their multimodal understanding capacity. In VMTC module, the primary tokens are retained and the redundant tokens are condensed by token clustering and merging. In CMAI process, we aggregate text-to-image attentions by text-to-text attentions to obtain a text-to-image focus score. Attention inhibition is performed on the text-image token pairs with low scores. Our comprehensive experiments over instruction-following capabilities and VQA-V2, GQA, TextVQA, MME and MMBench five benchmarks, demonstrate that proposed strategy significantly enhances the instruction following capability of MLLMs while preserving the ability to understand and process multimodal inputs.
Scaling Laws for Online Advertisement Retrieval
Wang, Yunli, Yang, Zixuan, Zhang, Zhen, Wang, Zhiqiang, Yang, Jian, Wen, Shiyang, Jiang, Peng, Gai, Kun
The scaling law is a notable property of neural network models and has significantly propelled the development of large language models. Scaling laws hold great promise in guiding model design and resource allocation. Recent research increasingly shows that scaling laws are not limited to NLP tasks or Transformer architectures; they also apply to domains such as recommendation. However, there is still a lack of literature on scaling law research in online advertisement retrieval systems. This may be because 1) identifying the scaling law for resource cost and online revenue is often expensive in both time and training resources for large-scale industrial applications, and 2) varying settings for different systems prevent the scaling law from being applied across various scenarios. To address these issues, we propose a lightweight paradigm to identify the scaling law of online revenue and machine cost for a certain online advertisement retrieval scenario with a low experimental cost. Specifically, we focus on a sole factor (FLOPs) and propose an offline metric named R/R* that exhibits a high linear correlation with online revenue for retrieval models. We estimate the machine cost offline via a simulation algorithm. Thus, we can transform most online experiments into low-cost offline experiments. We conduct comprehensive experiments to verify the effectiveness of our proposed metric R/R* and to identify the scaling law in the online advertisement retrieval system of Kuaishou. With the scaling law, we demonstrate practical applications for ROI-constrained model designing and multi-scenario resource allocation in Kuaishou advertising system. To the best of our knowledge, this is the first work to study the scaling laws for online advertisement retrieval of real-world systems, showing great potential for scaling law in advertising system optimization.
Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment
Zheng, Yi, Li, Zehao, Jiang, Peng, Peng, Yijie
We study the dynamic pricing and replenishment problems under inconsistent papers to INFORMS journals by means decision frequencies. Different from the traditional demand assumption, the of a style file template, which includes discreteness of demand and the parameter within the Poisson distribution as a function the journal title. However, use of a template of price introduce complexity into analyzing the problem property. We demonstrate does not certify that the paper the concavity of the single-period profit function with respect to product price and has been accepted for publication in the inventory within their respective domains. The demand model is enhanced by integrating named journal. INFORMS journal templates a decision tree-based machine learning approach, trained on comprehensive are for the exclusive purpose of market data. Employing a two-timescale stochastic approximation scheme, we address submitting to an INFORMS journal and the discrepancies in decision frequencies between pricing and replenishment, ensuring are not intended to be a true representation convergence to local optimum. We further refine our methodology by incorporating of the article's final published form.
GPR Full-Waveform Inversion through Adaptive Filtering of Model Parameters and Gradients Using CNN
Jiang, Peng, Wang, Kun, Wang, Jiaxing, Feng, Zeliang, Qiao, Shengjie, Deng, Runhuai, Zhang, Fengkai
GPR full-waveform inversion optimizes the subsurface property model iteratively to match the entire waveform information. However, the model gradients derived from wavefield continuation often contain errors, such as ghost values and excessively large values at transmitter and receiver points. Furthermore, models updated based on these gradients frequently exhibit unclear characterization of anomalous bodies or false anomalies, making it challenging to obtain accurate inversion results. To address these issues, we introduced a novel full-waveform inversion (FWI) framework that incorporates an embedded convolutional neural network (CNN) to adaptively filter model parameters and gradients. Specifically, we embedded the CNN module before the forward modeling process and ensured the entire FWI process remains differentiable. This design leverages the auto-grad tool of the deep learning library, allowing model values to pass through the CNN module during forward computation and model gradients to pass through the CNN module during backpropagation. Experiments have shown that filtering the model parameters during forward computation and the model gradients during backpropagation can ultimately yield high-quality inversion results.
Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction
Fan, Zhongxiang, Liu, Zhaocheng, Liang, Jian, Kong, Dongying, Li, Han, Jiang, Peng, Li, Shuang, Gai, Kun
This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary issue. To address this, we introduce a novel and simple Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios, which can be seamlessly integrated into existing deep CTR models and may have potential applications to handle the "forgetting or overfitting" dilemma in the retraining and the well-known catastrophic forgetting problems. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data or the Multi-Layer Perceptron (MLP) layers, and achieves data augmentation through training the MLP with varied embedding spaces. Our findings confirm that pre-trained MLP layers can adapt to new embedding spaces, enhancing performance without overfitting. This adaptability underscores the MLP layers' role in learning a matching function focused on the relative relationships among embeddings rather than their absolute positions. To our knowledge, MEDA represents the first multi-epoch training strategy tailored for deep CTR prediction models. We conduct extensive experiments on several public and business datasets, and the effectiveness of data augmentation and superiority over conventional single-epoch training are fully demonstrated. Besides, MEDA has exhibited significant benefits in a real-world online advertising system.
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Jia, Jian, Wang, Yipei, Li, Yan, Chen, Honggang, Bai, Xuehan, Liu, Zhaocheng, Liang, Jian, Chen, Quan, Li, Han, Jiang, Peng, Gai, Kun
Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabilities of Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems by integrating open-world domain knowledge. In this paper, we propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge. We address computational complexity concerns by utilizing pretrained LLMs as item encoders and freezing LLM parameters to avoid catastrophic forgetting and preserve open-world knowledge. To bridge the gap between the open-world and collaborative domains, we design a twin-tower structure supervised by the recommendation task and tailored for practical industrial application. Through offline experiments on the large-scale industrial dataset and online experiments on A/B tests, we demonstrate the efficacy of our approach.
A Model-based Multi-Agent Personalized Short-Video Recommender System
Zhou, Peilun, Xu, Xiaoxiao, Hu, Lantao, Li, Han, Jiang, Peng
Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.
RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm
Zhang, Yabin, Yu, Wenhui, Zhang, Erhan, Chen, Xu, Hu, Lantao, Jiang, Peng, Gai, Kun
ChatGPT has achieved remarkable success in natural language understanding. Considering that recommendation is indeed a conversation between users and the system with items as words, which has similar underlying pattern with ChatGPT, we design a new chat framework in item index level for the recommendation task. Our novelty mainly contains three parts: model, training and inference. For the model part, we adopt Generative Pre-training Transformer (GPT) as the sequential recommendation model and design a user modular to capture personalized information. For the training part, we adopt the two-stage paradigm of ChatGPT, including pre-training and fine-tuning. In the pre-training stage, we train GPT model by auto-regression. In the fine-tuning stage, we train the model with prompts, which include both the newly-generated results from the model and the user's feedback. For the inference part, we predict several user interests as user representations in an autoregressive manner. For each interest vector, we recall several items with the highest similarity and merge the items recalled by all interest vectors into the final result. We conduct experiments with both offline public datasets and online A/B test to demonstrate the effectiveness of our proposed method.
Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation
Viswanath, Kasi, Jiang, Peng, Saripalli, Srikanth
LiDAR semantic segmentation frameworks predominantly leverage geometry-based features to differentiate objects within a scan. While these methods excel in scenarios with clear boundaries and distinct shapes, their performance declines in environments where boundaries are blurred, particularly in off-road contexts. To address this, recent strides in 3D segmentation algorithms have focused on harnessing raw LiDAR intensity measurements to improve prediction accuracy. Despite these efforts, current learning-based models struggle to correlate the intricate connections between raw intensity and factors such as distance, incidence angle, material reflectivity, and atmospheric conditions. Building upon our prior work, this paper delves into the advantages of employing calibrated intensity (also referred to as reflectivity) within learning-based LiDAR semantic segmentation frameworks. We initially establish that incorporating reflectivity as an input enhances the existing LiDAR semantic segmentation model. Furthermore, we present findings that enable the model to learn to calibrate intensity can boost its performance. Through extensive experimentation on the off-road dataset Rellis-3D, we demonstrate notable improvements. Specifically, converting intensity to reflectivity results in a 4% increase in mean Intersection over Union (mIoU) when compared to using raw intensity in Off-road scenarios. Additionally, we also investigate the possible benefits of using calibrated intensity in semantic segmentation in urban environments (SemanticKITTI) and cross-sensor domain adaptation.
3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization
Jiang, Peng, Pandey, Gaurav, Saripalli, Srikanth
This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset.