AITopics | Gao, Xiaowei

Collaborating Authors

Gao, Xiaowei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

Li, Pengxiang, Yin, Lu, Gao, Xiaowei, Liu, Shiwei

arXiv.org Artificial IntelligenceMay-28-2024

The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs, which dynamically samples pre-trained layers to fine-tune instead of adding additional adaptors. We first interpret the outlier phenomenon through the lens of Heavy-Tailed Self-Regularization theory (HT-SR), discovering that layers with more outliers tend to be more heavy-tailed and consequently better trained. Inspired by this finding, OwLore strategically assigns higher sampling probabilities to layers with more outliers to better leverage the knowledge stored in pre-trained LLMs. To further mitigate the memory demands of fine-tuning, we integrate gradient low-rank projection into our approach, which facilitates each layer to be efficiently trained in a low-rank manner. By incorporating the efficient characteristics of low-rank and optimal layerwise sampling, OwLore significantly improves the memory-performance trade-off in LLM pruning. Our extensive experiments across various architectures, including LLaMa2, LLaMa3, and Mistral, demonstrate that OwLore consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB of memory.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.1838

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text?

Ilyankou, Ilya, Lipani, Aldo, Cavazzi, Stefano, Gao, Xiaowei, Haworth, James

arXiv.org Artificial IntelligenceApr-5-2024

Sentence transformers are language models designed to perform semantic search. This study investigates the capacity of sentence transformers, fine-tuned on general question-answering datasets for asymmetric semantic search, to associate descriptions of human-generated routes across Great Britain with queries often used to describe hiking experiences. We find that sentence transformers have some zero-shot capabilities to understand quasi-geospatial concepts, such as route types and difficulty, suggesting their potential utility for routing recommendation systems.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.04169

Country:

Europe > United Kingdom > Wales > Vale of Glamorgan (0.14)
Europe > United Kingdom > Wales > Caerphilly (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)

Add feedback

Time Series Supplier Allocation via Deep Black-Litterman Model

Luo, Jiayuan, Zhang, Wentao, Fang, Yuchen, Gao, Xiaowei, Zhuang, Dingyi, Chen, Hao, Jiang, Xinke

arXiv.org Artificial IntelligenceFeb-9-2024

Time Series Supplier Allocation (TSSA) poses a complex NP-hard challenge, aimed at refining future order dispatching strategies to satisfy order demands with maximum supply efficiency fully. Traditionally derived from financial portfolio management, the Black-Litterman (BL) model offers a new perspective for the TSSA scenario by balancing expected returns against insufficient supply risks. However, its application within TSSA is constrained by the reliance on manually constructed perspective matrices and spatio-temporal market dynamics, coupled with the absence of supervisory signals and data unreliability inherent to supplier information. To solve these limitations, we introduce the pioneering Deep Black-Litterman Model (DBLM), which innovatively adapts the BL model from financial roots to supply chain context. Leveraging the Spatio-Temporal Graph Neural Networks (STGNNS), DBLM automatically generates future perspective matrices for TSSA, by integrating spatio-temporal dependency. Moreover, a novel Spearman rank correlation distinctively supervises our approach to address the lack of supervisory signals, specifically designed to navigate through the complexities of supplier risks and interactions. This is further enhanced by a masking mechanism aimed at counteracting the biases from unreliable data, thereby improving the model's precision and reliability. Extensive experimentation on two datasets unequivocally demonstrates DBLM's enhanced performance in TSSA, setting new standards for the field. Our findings and methodology are made available for community access and further development.

data mining, machine learning, supplier, (18 more...)

arXiv.org Artificial Intelligence

2401.1735

Country:

Europe (0.47)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Spatiotemporal Graph Neural Networks with Uncertainty Quantification for Traffic Incident Risk Prediction

Gao, Xiaowei, Jiang, Xinke, Zhuang, Dingyi, Chen, Huanfa, Wang, Shenhao, Haworth, James

arXiv.org Artificial IntelligenceSep-10-2023

Predicting traffic incident risks at granular spatiotemporal levels is challenging. The datasets predominantly feature zero values, indicating no incidents, with sporadic high-risk values for severe incidents. Notably, a majority of current models, especially deep learning methods, focus solely on estimating risk values, overlooking the uncertainties arising from the inherently unpredictable nature of incidents. To tackle this challenge, we introduce the Spatiotemporal Zero-Inflated Tweedie Graph Neural Networks (STZITD-GNNs). Our model merges the reliability of traditional statistical models with the flexibility of graph neural networks, aiming to precisely quantify uncertainties associated with road-level traffic incident risks. This model strategically employs a compound model from the Tweedie family, as a Poisson distribution to model risk frequency and a Gamma distribution to account for incident severity. Furthermore, a zero-inflated component helps to identify the non-incident risk scenarios. As a result, the STZITD-GNNs effectively capture the dataset's skewed distribution, placing emphasis on infrequent but impactful severe incidents. Empirical tests using real-world traffic data from London, UK, demonstrate that our model excels beyond current benchmarks. The forte of STZITD-GNN resides not only in its accuracy but also in its adeptness at curtailing uncertainties, delivering robust predictions over short (7 days) and extended (14 days) timeframes.

artificial intelligence, machine learning, prediction, (21 more...)

arXiv.org Artificial Intelligence

2309.05072

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Greater London > London (0.48)

Genre: Research Report > New Finding (0.92)

Industry:

Transportation > Ground > Road (1.00)
Health & Medicine (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

Uncertainty Quantification via Spatial-Temporal Tweedie Model for Zero-inflated and Long-tail Travel Demand Prediction

Jiang, Xinke, Zhuang, Dingyi, Zhang, Xianghui, Chen, Hao, Luo, Jiayuan, Gao, Xiaowei

arXiv.org Artificial IntelligenceJun-16-2023

crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.

artificial intelligence, machine learning, travel demand, (16 more...)

arXiv.org Artificial Intelligence

2306.09882

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.84)

Industry:

Transportation > Infrastructure & Services (0.68)
Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback