Liu, Haiyang
GWQ: Gradient-Aware Weight Quantization for Large Language Models
Shao, Yihua, Liang, Siyu, Ling, Zijian, Yan, Minxi, Liu, Haiyang, Chen, Siyu, Yan, Ziyang, Zhang, Chenyu, Qin, Haotong, Magno, Michele, Yang, Yang, Lei, Zhen, Wang, Yan, Guo, Jingcai, Shao, Ling, Tang, Hao
Large language models (LLMs) show impressive performance in solving complex language tasks. However, its large number of parameters present significant challenges for the deployment and application of the model on edge devices. Compressing large language models to low bits can enable them to run on resource-constrained devices, often leading to performance degradation. To address this problem, we propose gradient-aware weight quantization (GWQ), the first quantization approach for low-bit weight quantization that leverages gradients to localize outliers, requiring only a minimal amount of calibration data for outlier detection. GWQ retains the weights corresponding to the top 1% outliers preferentially at FP16 precision, while the remaining non-outlier weights are stored in a low-bit format. GWQ found experimentally that utilizing the sensitive weights in the gradient localization model is more scientific compared to utilizing the sensitive weights in the Hessian matrix localization model. Compared to current quantization methods, GWQ can be applied to multiple language models and achieves lower PPL on the WikiText2 and C4 dataset. In the zero-shot task, GWQ quantized models have higher accuracy compared to other quantization methods. GWQ is also suitable for multimodal model quantization, and the quantized Qwen-VL family model is more accurate than other methods. Zero-shot target detection task dataset RefCOCO outperforms the current stat-of-the-arts method SPQR. GWQ achieves 1.2 times inference speedup in comparison to the original model, and effectively reduces the inference memory.
Global-Aware Enhanced Spatial-Temporal Graph Recurrent Networks: A New Framework For Traffic Flow Prediction
Liu, Haiyang, Zhu, Chunjiang, Zhang, Detian
Traffic flow prediction plays a crucial role in alleviating traffic congestion and enhancing transport efficiency. While combining graph convolution networks with recurrent neural networks for spatial-temporal modeling is a common strategy in this realm, the restricted structure of recurrent neural networks limits their ability to capture global information. For spatial modeling, many prior studies learn a graph structure that is assumed to be fixed and uniform at all time steps, which may not be true. This paper introduces a novel traffic prediction framework, Global-Aware Enhanced Spatial-Temporal Graph Recurrent Network (GA-STGRN), comprising two core components: a spatial-temporal graph recurrent neural network and a global awareness layer. Within this framework, three innovative prediction models are formulated. A sequence-aware graph neural network is proposed and integrated into the Gated Recurrent Unit (GRU) to learn non-fixed graphs at different time steps and capture local temporal relationships. To enhance the model's global perception, three distinct global spatial-temporal transformer-like architectures (GST^2) are devised for the global awareness layer. We conduct extensive experiments on four real traffic datasets and the results demonstrate the superiority of our framework and the three concrete models.
Multi-Scale Spatial-Temporal Recurrent Networks for Traffic Flow Prediction
Liu, Haiyang, Zhu, Chunjiang, Zhang, Detian, Li, Qing
Traffic flow prediction is one of the most fundamental tasks of intelligent transportation systems. The complex and dynamic spatial-temporal dependencies make the traffic flow prediction quite challenging. Although existing spatial-temporal graph neural networks hold prominent, they often encounter challenges such as (1) ignoring the fixed graph that limits the predictive performance of the model, (2) insufficiently capturing complex spatial-temporal dependencies simultaneously, and (3) lacking attention to spatial-temporal information at different time lengths. In this paper, we propose a Multi-Scale Spatial-Temporal Recurrent Network for traffic flow prediction, namely MSSTRN, which consists of two different recurrent neural networks: the single-step gate recurrent unit and the multi-step gate recurrent unit to fully capture the complex spatial-temporal information in the traffic data under different time windows. Moreover, we propose a spatial-temporal synchronous attention mechanism that integrates adaptive position graph convolutions into the self-attention mechanism to achieve synchronous capture of spatial-temporal dependencies. We conducted extensive experiments on four real traffic datasets and demonstrated that our model achieves the best prediction accuracy with non-trivial margins compared to all the twenty baseline methods.
Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation
Xiang, Jie, Wang, Yun, An, Lifeng, Liu, Haiyang, Liu, Jian
Although both self-supervised single-frame and multi-frame depth estimation methods only require unlabeled monocular videos for training, the information they leverage varies because single-frame methods mainly rely on appearance-based features while multi-frame methods focus on geometric cues. Considering the complementary information of single-frame and multi-frame methods, some works attempt to leverage single-frame depth to improve multi-frame depth. However, these methods can neither exploit the difference between single-frame depth and multi-frame depth to improve multi-frame depth nor leverage multi-frame depth to optimize single-frame depth models. To fully utilize the mutual influence between single-frame and multi-frame methods, we propose a novel self-supervised training framework. Specifically, we first introduce a pixel-wise adaptive depth sampling module guided by single-frame depth to train the multi-frame model. Then, we leverage the minimum reprojection based distillation loss to transfer the knowledge from the multi-frame depth network to the single-frame network to improve single-frame depth. Finally, we regard the improved single-frame depth as a prior to further boost the performance of multi-frame depth estimation. Experimental results on the KITTI and Cityscapes datasets show that our method outperforms existing approaches in the self-supervised monocular setting.
Attention-based Spatial-Temporal Graph Convolutional Recurrent Networks for Traffic Forecasting
Liu, Haiyang, Zhu, Chunjiang, Zhang, Detian, Li, Qing
Traffic forecasting is one of the most fundamental problems in transportation science and artificial intelligence. The key challenge is to effectively model complex spatial-temporal dependencies and correlations in modern traffic data. Existing methods, however, cannot accurately model both long-term and short-term temporal correlations simultaneously, limiting their expressive power on complex spatial-temporal patterns. In this paper, we propose a novel spatial-temporal neural network framework: Attention-based Spatial-Temporal Graph Convolutional Recurrent Network (ASTGCRN), which consists of a graph convolutional recurrent module (GCRN) and a global attention module. In particular, GCRN integrates gated recurrent units and adaptive graph convolutional networks for dynamically learning graph structures and capturing spatial dependencies and local temporal relationships. To effectively extract global temporal dependencies, we design a temporal attention layer and implement it as three independent modules based on multi-head self-attention, transformer, and informer respectively. Extensive experiments on five real traffic datasets have demonstrated the excellent predictive performance of all our three models with all their average MAE, RMSE and MAPE across the test datasets lower than the baseline methods.
Towards User Friendly Medication Mapping Using Entity-Boosted Two-Tower Neural Network
Yuan, Shaoqing, Bhatia, Parminder, Celikkaya, Busra, Liu, Haiyang, Choi, Kyunghwan
Recent advancements in medical entity linking have been applied in the area of scientific literature and social media data. However, with the adoption of telemedicine and conversational agents such as Alexa in healthcare settings, medical name inference has become an important task. Medication name inference is the task of mapping user friendly medication names from a free-form text to a concept in a normalized medication list. This is challenging due to the differences in the use of medical terminology from health care professionals and user conversations coming from the lay public. We begin with mapping descriptive medication phrases (DMP) to standard medication names (SMN). Given the prescriptions of each patient, we want to provide them with the flexibility of referring to the medication in their preferred ways. We approach this as a ranking problem which maps SMN to DMP by ordering the list of medications in the patient's prescription list obtained from pharmacies. Furthermore, we leveraged the output of intermediate layers and performed medication clustering. We present the Medication Inference Model (MIM) achieving state-of-the-art results. By incorporating medical entities based attention, we have obtained further improvement for ranking models.
Random Pairwise Shapelets Forest
Shi, Mohan, Wang, Zhihai, Yuan, Jodong, Liu, Haiyang
Shapelet is a discriminative subsequence of time series. An advanced shapelet-based method is to embed shapelet into accurate and fast random forest. However, it shows several limitations. First, random shapelet forest requires a large training cost for split threshold searching. Second, a single shapelet provides limited information for only one branch of the decision tree, resulting in insufficient accuracy and interpretability. Third, randomized ensemble causes interpretability declining. For that, this paper presents Random Pairwise Shapelets Forest (RPSF). RPSF combines a pair of shapelets from different classes to construct random forest. It omits threshold searching to be more efficient, includes more information for each node of the forest to be more effective. Moreover, a discriminability metric, Decomposed Mean Decrease Impurity (DMDI), is proposed to identify influential region for every class. Extensive experiments show RPSF improves the accuracy and training speed of shapelet-based forest. Case studies demonstrate the interpretability of our method.