AITopics | Liu, Haiyang

Collaborating Authors

Liu, Haiyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GWQ: Gradient-Aware Weight Quantization for Large Language Models

Shao, Yihua, Liang, Siyu, Ling, Zijian, Yan, Minxi, Liu, Haiyang, Chen, Siyu, Yan, Ziyang, Zhang, Chenyu, Qin, Haotong, Magno, Michele, Yang, Yang, Lei, Zhen, Wang, Yan, Guo, Jingcai, Shao, Ling, Tang, Hao

arXiv.org Artificial IntelligenceDec-4-2024

Large language models (LLMs) show impressive performance in solving complex language tasks. However, its large number of parameters present significant challenges for the deployment and application of the model on edge devices. Compressing large language models to low bits can enable them to run on resource-constrained devices, often leading to performance degradation. To address this problem, we propose gradient-aware weight quantization (GWQ), the first quantization approach for low-bit weight quantization that leverages gradients to localize outliers, requiring only a minimal amount of calibration data for outlier detection. GWQ retains the weights corresponding to the top 1% outliers preferentially at FP16 precision, while the remaining non-outlier weights are stored in a low-bit format. GWQ found experimentally that utilizing the sensitive weights in the gradient localization model is more scientific compared to utilizing the sensitive weights in the Hessian matrix localization model. Compared to current quantization methods, GWQ can be applied to multiple language models and achieves lower PPL on the WikiText2 and C4 dataset. In the zero-shot task, GWQ quantized models have higher accuracy compared to other quantization methods. GWQ is also suitable for multimodal model quantization, and the quantized Qwen-VL family model is more accurate than other methods. Zero-shot target detection task dataset RefCOCO outperforms the current stat-of-the-arts method SPQR. GWQ achieves 1.2 times inference speedup in comparison to the original model, and effectively reduces the inference memory.

large language model, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

2411.0085

Genre: Research Report > Promising Solution (0.35)

Industry: Energy (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Global-Aware Enhanced Spatial-Temporal Graph Recurrent Networks: A New Framework For Traffic Flow Prediction

Liu, Haiyang, Zhu, Chunjiang, Zhang, Detian

arXiv.org Artificial IntelligenceJan-7-2024

Traffic flow prediction plays a crucial role in alleviating traffic congestion and enhancing transport efficiency. While combining graph convolution networks with recurrent neural networks for spatial-temporal modeling is a common strategy in this realm, the restricted structure of recurrent neural networks limits their ability to capture global information. For spatial modeling, many prior studies learn a graph structure that is assumed to be fixed and uniform at all time steps, which may not be true. This paper introduces a novel traffic prediction framework, Global-Aware Enhanced Spatial-Temporal Graph Recurrent Network (GA-STGRN), comprising two core components: a spatial-temporal graph recurrent neural network and a global awareness layer. Within this framework, three innovative prediction models are formulated. A sequence-aware graph neural network is proposed and integrated into the Gated Recurrent Unit (GRU) to learn non-fixed graphs at different time steps and capture local temporal relationships. To enhance the model's global perception, three distinct global spatial-temporal transformer-like architectures (GST^2) are devised for the global awareness layer. We conduct extensive experiments on four real traffic datasets and the results demonstrate the superiority of our framework and the three concrete models.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2401.04135

Country: North America > United States > North Carolina (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation (1.00)
Consumer Products & Services > Travel (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Scale Spatial-Temporal Recurrent Networks for Traffic Flow Prediction

Liu, Haiyang, Zhu, Chunjiang, Zhang, Detian, Li, Qing

arXiv.org Artificial IntelligenceOct-12-2023

Traffic flow prediction is one of the most fundamental tasks of intelligent transportation systems. The complex and dynamic spatial-temporal dependencies make the traffic flow prediction quite challenging. Although existing spatial-temporal graph neural networks hold prominent, they often encounter challenges such as (1) ignoring the fixed graph that limits the predictive performance of the model, (2) insufficiently capturing complex spatial-temporal dependencies simultaneously, and (3) lacking attention to spatial-temporal information at different time lengths. In this paper, we propose a Multi-Scale Spatial-Temporal Recurrent Network for traffic flow prediction, namely MSSTRN, which consists of two different recurrent neural networks: the single-step gate recurrent unit and the multi-step gate recurrent unit to fully capture the complex spatial-temporal information in the traffic data under different time windows. Moreover, we propose a spatial-temporal synchronous attention mechanism that integrates adaptive position graph convolutions into the self-attention mechanism to achieve synchronous capture of spatial-temporal dependencies. We conducted extensive experiments on four real traffic datasets and demonstrated that our model achieves the best prediction accuracy with non-trivial margins compared to all the twenty baseline methods.

artificial intelligence, forecasting, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.08138

Country:

North America > United States > North Carolina (0.14)
North America > United States > California (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Transportation (1.00)
Consumer Products & Services > Travel (1.00)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation

Xiang, Jie, Wang, Yun, An, Lifeng, Liu, Haiyang, Liu, Jian

arXiv.org Artificial IntelligenceAug-27-2023

Although both self-supervised single-frame and multi-frame depth estimation methods only require unlabeled monocular videos for training, the information they leverage varies because single-frame methods mainly rely on appearance-based features while multi-frame methods focus on geometric cues. Considering the complementary information of single-frame and multi-frame methods, some works attempt to leverage single-frame depth to improve multi-frame depth. However, these methods can neither exploit the difference between single-frame depth and multi-frame depth to improve multi-frame depth nor leverage multi-frame depth to optimize single-frame depth models. To fully utilize the mutual influence between single-frame and multi-frame methods, we propose a novel self-supervised training framework. Specifically, we first introduce a pixel-wise adaptive depth sampling module guided by single-frame depth to train the multi-frame model. Then, we leverage the minimum reprojection based distillation loss to transfer the knowledge from the multi-frame depth network to the single-frame network to improve single-frame depth. Finally, we regard the improved single-frame depth as a prior to further boost the performance of multi-frame depth estimation. Experimental results on the KITTI and Cityscapes datasets show that our method outperforms existing approaches in the self-supervised monocular setting.

artificial intelligence, depth estimation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2023.3309134

2304.12685

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Attention-based Spatial-Temporal Graph Convolutional Recurrent Networks for Traffic Forecasting

Liu, Haiyang, Zhu, Chunjiang, Zhang, Detian, Li, Qing

arXiv.org Artificial IntelligenceFeb-24-2023

Traffic forecasting is one of the most fundamental problems in transportation science and artificial intelligence. The key challenge is to effectively model complex spatial-temporal dependencies and correlations in modern traffic data. Existing methods, however, cannot accurately model both long-term and short-term temporal correlations simultaneously, limiting their expressive power on complex spatial-temporal patterns. In this paper, we propose a novel spatial-temporal neural network framework: Attention-based Spatial-Temporal Graph Convolutional Recurrent Network (ASTGCRN), which consists of a graph convolutional recurrent module (GCRN) and a global attention module. In particular, GCRN integrates gated recurrent units and adaptive graph convolutional networks for dynamically learning graph structures and capturing spatial dependencies and local temporal relationships. To effectively extract global temporal dependencies, we design a temporal attention layer and implement it as three independent modules based on multi-head self-attention, transformer, and informer respectively. Extensive experiments on five real traffic datasets have demonstrated the excellent predictive performance of all our three models with all their average MAE, RMSE and MAPE across the test datasets lower than the baseline methods.

artificial intelligence, machine learning, spatial reasoning, (19 more...)

arXiv.org Artificial Intelligence

2302.12973

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards User Friendly Medication Mapping Using Entity-Boosted Two-Tower Neural Network

Yuan, Shaoqing, Bhatia, Parminder, Celikkaya, Busra, Liu, Haiyang, Choi, Kyunghwan

arXiv.org Machine LearningOct-9-2020

Recent advancements in medical entity linking have been applied in the area of scientific literature and social media data. However, with the adoption of telemedicine and conversational agents such as Alexa in healthcare settings, medical name inference has become an important task. Medication name inference is the task of mapping user friendly medication names from a free-form text to a concept in a normalized medication list. This is challenging due to the differences in the use of medical terminology from health care professionals and user conversations coming from the lay public. We begin with mapping descriptive medication phrases (DMP) to standard medication names (SMN). Given the prescriptions of each patient, we want to provide them with the flexibility of referring to the medication in their preferred ways. We approach this as a ranking problem which maps SMN to DMP by ordering the list of medications in the patient's prescription list obtained from pharmacies. Furthermore, we leveraged the output of intermediate layers and performed medication clustering. We present the Medication Inference Model (MIM) achieving state-of-the-art results. By incorporating medical entities based attention, we have obtained further improvement for ranking models.

deep learning, medication, vascular disease, (20 more...)

arXiv.org Machine Learning

2007.00492

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Random Pairwise Shapelets Forest

Shi, Mohan, Wang, Zhihai, Yuan, Jodong, Liu, Haiyang

arXiv.org Machine LearningApr-22-2019

Shapelet is a discriminative subsequence of time series. An advanced shapelet-based method is to embed shapelet into accurate and fast random forest. However, it shows several limitations. First, random shapelet forest requires a large training cost for split threshold searching. Second, a single shapelet provides limited information for only one branch of the decision tree, resulting in insufficient accuracy and interpretability. Third, randomized ensemble causes interpretability declining. For that, this paper presents Random Pairwise Shapelets Forest (RPSF). RPSF combines a pair of shapelets from different classes to construct random forest. It omits threshold searching to be more efficient, includes more information for each node of the forest to be more effective. Moreover, a discriminability metric, Decomposed Mean Decrease Impurity (DMDI), is proposed to identify influential region for every class. Extensive experiments show RPSF improves the accuracy and training speed of shapelet-based forest. Case studies demonstrate the interpretability of our method.

random pairwise shapelet forest

arXiv.org Machine Learning

1903.07799

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)

Add feedback