Textual Data for Time Series Forecasting

Obst, David, Ghattas, Badih, Claudel, Sandra, Cugliari, Jairo, Goude, Yannig, Oppenheim, Georges

arXiv.org Machine Learning 

David Obst a,b, Badih Ghattas b, Sandra Claudel a, Jairo Cugliari c, Yannig Goude a, Georges Oppenheim d a EDF R&D, Palaiseau, France b Institut de Math ematiques de Marseille, Aix-Marseille Universit e, France c ERIC, Universit e de Lyon 2, France d Laboratoire d'Analyse et de Math ematiques Appliqu ees Universit e Paris-Est, Champs-sur-Marne, FranceAbstract While ubiquitous, textual sources of information such as company reports, social media posts, etc. are hardly included in prediction algorithms for time series, despite the relevant information they may contain. In this work, openly accessible daily weather reports from France and the United-Kingdom are leveraged to predict time series of national electricity consumption, average temperature and wind-speed with a single pipeline. Two methods of numerical representation of text are considered, namely traditional Term Frequency - Inverse Document Frequency (TF-IDF) as well as our own neural word embedding. Using exclusively text, we are able to predict the aforementioned time series with sufficient accuracy to be used to replace missing data. Furthermore the proposed word embeddings display geometric properties relating to the behavior of the time series and context similarity between words. Introduction Whether it is in the field of energy, finance or meteorology, accurately predicting the behavior of time series is nowadays of paramount importance for optimal decision making or profit. While the field of time series forecasting is extremely prolific from a research point-of-view, up to now it has narrowed its efforts on the exploitation of regular numerical features extracted from sensors, data bases or stock exchanges. Unstructured data such as text on the other hand remains underexploited for prediction tasks, despite its potentially valuable informative content. Empirical studies have already proven that textual sources such as news articles or blog entries can be correlated to stock exchange time series and have explanatory power for their variations [1, 2]. This observation has motivated multiple extensive experiments to extract relevant features from textual documents in different ways and use them for prediction, notably in the field of finance. In Lavrenko et al. [3], language models (considering only the presence of a word) are used to estimate the probability of trends such as surges or falls of 127 different stock values using articles from Biz Yahoo!. Their results show that this text driven approach could be used to make profit on the market. One of the most conventional ways for text representation is the TF-IDF (Term Frequency - Inverse Document Frequency) approach.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found