Goto

Collaborating Authors

PC-GAIN: Pseudo-label Conditional Generative Adversarial Imputation Networks for Incomplete Data

arXiv.org Machine Learning

Datasets with missing values are very common in real world applications. GAIN, a recently proposed deep generative model for missing data imputation, has been proved to outperform many state-of-the-art methods. But GAIN only uses a reconstruction loss in the generator to minimize the imputation error of the non-missing part, ignoring the potential category information which can reflect the relationship between samples. In this paper, we propose a novel unsupervised missing data imputation method named PC-GAIN, which utilizes potential category information to further enhance the imputation power. Specifically, we first propose a pre-training procedure to learn potential category information contained in a subset of low-missing-rate data. Then an auxiliary classifier is determined based on the synthetic pseudo-labels. Further, this classifier is incorporated into the generative adversarial framework to help the generator to yield higher quality imputation results. The proposed method can significantly improve the imputation quality of GAIN. Experimental results on various benchmark datasets show that our method is also superior to other baseline models.


Multivariate Time Series Imputation with Generative Adversarial Networks

Neural Information Processing Systems

Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data. Conventional approaches to addressing the challenge of missing values, including mean/zero imputation, case deletion, and matrix factorization-based imputation, are all incapable of modeling the temporal dependencies and the nature of complex distribution in multivariate time series. In this paper, we treat the problem of missing value imputation as data generation. Inspired by the success of Generative Adversarial Networks (GAN) in image generation, we propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. Different from the image data, the time series data are usually incomplete due to the nature of data recording process. A modified Gate Recurrent Unit is employed in GAN to model the temporal irregularity of the incomplete time series. Experiments on two multivariate time series datasets show that the proposed model outperformed the baselines in terms of accuracy of imputation. Experimental results also showed that a simple model on the imputed data can achieve state-of-the-art results on the prediction tasks, demonstrating the benefits of our model in downstream applications.


Multivariate Time Series Imputation with Generative Adversarial Networks

Neural Information Processing Systems

Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data. Conventional approaches to addressing the challenge of missing values, including mean/zero imputation, case deletion, and matrix factorization-based imputation, are all incapable of modeling the temporal dependencies and the nature of complex distribution in multivariate time series. In this paper, we treat the problem of missing value imputation as data generation. Inspired by the success of Generative Adversarial Networks (GAN) in image generation, we propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. Different from the image data, the time series data are usually incomplete due to the nature of data recording process. A modified Gate Recurrent Unit is employed in GAN to model the temporal irregularity of the incomplete time series. Experiments on two multivariate time series datasets show that the proposed model outperformed the baselines in terms of accuracy of imputation. Experimental results also showed that a simple model on the imputed data can achieve state-of-the-art results on the prediction tasks, demonstrating the benefits of our model in downstream applications.


RDIS: Random Drop Imputation with Self-Training for Incomplete Time Series Data

arXiv.org Machine Learning

It is common that time-series data with missing values are encountered in many fields such as in finance, meteorology, and robotics. Imputation is an intrinsic method to handle such missing values. In the previous research, most of imputation networks were trained implicitly for the incomplete time series data because missing values have no ground truth. This paper proposes Random Drop Imputation with Self-training (RDIS), a novel training method for imputation networks for the incomplete time-series data. In RDIS, there are extra missing values by applying a random drop on the given incomplete data such that the imputation network can explicitly learn by imputing the random drop values. Also, self-training is introduced to exploit the original missing values without ground truth. To verify the effectiveness of our RDIS on imputation tasks, we graft RDIS to a bidirectional GRU and achieve state-of-the-art results on two real-world datasets, an air quality dataset and a gas sensor dataset with 7.9% and 5.8% margin, respectively.


Time-series Imputation and Prediction with Bi-Directional Generative Adversarial Networks

arXiv.org Machine Learning

Multivariate time-series data are used in many classification and regression predictive tasks, and recurrent models have been widely used for such tasks. Most common recurrent models assume that time-series data elements are of equal length and the ordered observations are recorded at regular intervals. However, real-world time-series data have neither a similar length nor a same number of observations. They also have missing entries, which hinders the performance of predictive tasks. In this paper, we approach these issues by presenting a model for the combined task of imputing and predicting values for the irregularly observed and varying length time-series data with missing entries. Our proposed model (Bi-GAN) uses a bidirectional recurrent network in a generative adversarial setting. The generator is a bidirectional recurrent network that receives actual incomplete data and imputes the missing values. The discriminator attempts to discriminate between the actual and the imputed values in the output of the generator. Our model learns how to impute missing elements in-between (imputation) or outside of the input time steps (prediction), hence working as an effective any-time prediction tool for time-series data. Our method has three advantages to the state-of-the-art methods in the field: (a) single model can be used for both imputation and prediction tasks; (b) it can perform prediction task for time-series of varying length with missing data; (c) it does not require to know the observation and prediction time window during training which provides a flexible length of prediction window for both long-term and short-term predictions. We evaluate our model on two public datasets and on another large real-world electronic health records dataset to impute and predict body mass index (BMI) values in children and show its superior performance in both settings.