Goto

Collaborating Authors

 He, Cheng


General Time-series Model for Universal Knowledge Representation of Multivariate Time-Series data

arXiv.org Artificial Intelligence

Universal knowledge representation is a central problem for multivariate time series(MTS) foundation models and yet remains open. This paper investigates this problem from the first principle and it makes four folds of contributions. First, a new empirical finding is revealed: time series with different time granularities (or corresponding frequency resolutions) exhibit distinct joint distributions in the frequency domain. This implies a crucial aspect of learning universal knowledge, one that has been overlooked by previous studies. Second, a novel Fourier knowledge attention mechanism is proposed to enable learning time granularity-aware representations from both the temporal and frequency domains. Third, an autoregressive blank infilling pre-training framework is incorporated to time series analysis for the first time, leading to a generative tasks agnostic pre-training strategy. To this end, we develop the General Time-series Model (GTM), a unified MTS foundation model that addresses the limitation of contemporary time series models, which often require token, pre-training, or model-level customizations for downstream tasks adaption. Fourth, extensive experiments show that GTM outperforms state-of-the-art (SOTA) methods across all generative tasks, including long-term forecasting, anomaly detection, and imputation.


Evolutionary Multi-Objective Optimization Driven by Generative Adversarial Networks

arXiv.org Artificial Intelligence

Recently, more and more works have proposed to drive evolutionary algorithms using machine learning models.Usually, the performance of such model based evolutionary algorithms is highly dependent on the training qualities of the adopted models.Since it usually requires a certain amount of data (i.e. the candidate solutions generated by the algorithms) for model training, the performance deteriorates rapidly with the increase of the problem scales, due to the curse of dimensionality.To address this issue, we propose a multi-objective evolutionary algorithm driven by the generative adversarial networks (GANs).At each generation of the proposed algorithm, the parent solutions are first classified into \emph{real} and \emph{fake} samples to train the GANs; then the offspring solutions are sampled by the trained GANs.Thanks to the powerful generative ability of the GANs, our proposed algorithm is capable of generating promising offspring solutions in high-dimensional decision space with limited training data.The proposed algorithm is tested on 10 benchmark problems with up to 200 decision variables.Experimental results on these test problems demonstrate the effectiveness of the proposed algorithm.


POIReviewQA: A Semantically Enriched POI Retrieval and Question Answering Dataset

arXiv.org Artificial Intelligence

Many services that perform information retrieval for Points of Interest (POI) utilize a Lucene-based setup with spatial filtering. While this type of system is easy to implement it does not make use of semantics but relies on direct word matches between a query and reviews leading to a loss in both precision and recall. To study the challenging task of semantically enriching POIs from unstructured data in order to support open-domain search and question answering (QA), we introduce a new dataset POIReviewQA. It consists of 20k questions (e.g."is this restaurant dog friendly?") for 1022 Yelp business types. For each question we sampled 10 reviews, and annotated each sentence in the reviews whether it answers the question and what the corresponding answer is. To test a system's ability to understand the text we adopt an information retrieval evaluation by ranking all the review sentences for a question based on the likelihood that they answer this question. We build a Lucene-based baseline model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a challenging problem for future research by the GIR community. The result technology can help exploit the thematic content of web documents and social media for characterisation of locations.