December 5, 2019 Abstract The notion of drift refers to the phenomenon that the distribution, which is underlying the observed data, changes over time. Albeit many attempts were made to deal with drift, formal notions of drift are application-dependent and formulated in various degrees of abstraction and mathematical coherence. In this contribution, we provide a probability theoretical framework, that allows a formalization of drift in continuous time, which subsumes popular notions of drift. It gives rise to a new characterization of drift in terms of stochastic dependency between data and time. This particularly intuitive formalization enables us to design a new, efficient drift detection method. Further, it induces a technology, to decompose observed data into a drifting and a non-drifting part. Keywords: Online learning, learning theory, stochastic processes, learning with drift, continuous time models, drift decomposition 1 INTRODUCTION One fundamental assumption in classical machine learning is the fact that observed data are i.i.d. Yet, this assumption is often violated as soon as machine learning faces real world problems: models are subject to seasonal changes, changed demands of individual costumers, ageing of sensors, etc. In such settings, lifelong model adaptation rather than classical batch learning is required for optimum performance. Since drift, i.e. the fact that data is no longer identically distributed, is a major issue in many real-world applications of machine learning, many attempts were made to deal with this setting (Ditzler et al., 2015). Depending on the domain of data and application, the presence of drift is modelled in different ways. As an example, covariate shift refers to the situation of training and test set having different marginal distributions (Gretton et al., 2009). Learning for data streams extends this setting to an unlimited (but usually countable) stream of observed data, mostly in supervised learning scenarios (Gama et al., 2014). Learning technologies for such situations often rely on windowing techniques, and adapt the model based on the characteristics of the data in an observed time window. Active methods explicitly detect drift, usually referring to drift of the classification error, and trigger model adaptation this way, while passive methods continuously adjust the model (Ditzler et al., 2015).
To cope with concept drift, we placed a probability distribution over the location of the most-recent drift point. We used Bayesian model comparison to update this distribution from the predictions of models trained on blocks of consecutive observations and pruned potential drift points with low probability. We compare our approach to a non-probabilistic method for drift and a probabilistic method for change-point detection. In our experiments, our approach generally yielded improved accuracy and/or speed over these other methods.
--Global physical event detection has traditionally relied on dense coverage of physical sensors around the world; while this is an expensive undertaking, there have not been alternatives until recently. The ubiquity of social networks and human sensors in the field provides a tremendous amount of real-time, live data about true physical events from around the world. However, while such human sensor data have been exploited for retrospective large-scale event detection, such as hurricanes or earthquakes, they has been limited to no success in exploiting this rich resource for general physical event detection. Prior implementation approaches have suffered from the concept drift phenomenon, where real-world data exhibits constant, unknown, unbounded changes in its data distribution, making static machine learning models ineffective in the long term. We propose and implement an end-to-end collaborative drift adaptive system that integrates corroborative and probabilistic sources to deliver real-time predictions. Furthermore, out system is adaptive to concept drift and performs automated continuous learning to maintain high performance. We demonstrate our approach in a real-time demo available online for landslide disaster detection, with extensibility to other real-world physical events such as flooding, wildfires, hurricanes, and earthquakes. Physical event detection, such as extreme weather events or traffic accidents have long been the domain of static event processors operating on numeric sensor data or human actors manually identifying event types. However, the emergence of big data and associated data processing and analytics tools and systems have led to several applications in large-scale event and trend detection in the streaming domain -. However, it is important to note that many of these works are a form of retrospective analysis, as opposed to true real-time event detection, since they perform analyses on cleaned and processed data within a short-time frame in the past, with the assumption that their approaches are sustainable and will continue to function over time.
VFDT) as a Big Data approach in dealing with the data stream for classification and regression problems showed good performance in handling facing challenges and making the possibility of anytime prediction. VFDTs are one of the most famous approaches in dealing with high-speed data streams with concept drift feature. VFDTs overcome other methods problems such as easy to be trapped in a local minimum (for ANN) and hard to decide proper kernel parameters and penalty (for SVR) , in contrast, they suffer from high latency in learning progress due to the existence of equally discriminative attributes. In this article, first, we address the cause of this latency and then introduce a method to diminish initial delay as much as possible. One of the most realistic examples of data stream mining which illustrates the importance of this area is load forecasting or electricity price forecasting. Load forecasting is a crucial topic for power electricity suppliers and nowadays many data scientists are working on developing forecasting approaches to achieve good accuracy and robustness ,  and . Another good example is fault detection in industries that help them to have a better guess of products life cycle so can achieve better maintenance services for their critical equipment . Security is another field that needs data stream mining algorithms to maintain its reliabilities, sensor data and alarm notifications need to be analyzed to detect smart attack scenarios in moment . In this article, we proposed a new data stream mining algorithm that not only covers all the weak points mentioned above but also improves the overall performance.
Most predictive models assume that training and test data are generated from a stationary process. However, this assumption does not hold true in practice. In this paper, we consider the scenario of a gradual concept drift due to the underlying non-stationarity of the data source. While previous work has investigated this scenario under a supervised-learning and adaption conditions, few have addressed the common, real-world scenario when labels are only available during training. We propose a novel, iterative algorithm for unsupervised adaptation of predictive models. We show that the performance of our batch adapted prediction algorithm is better than that of its corresponding unadapted version. The proposed algorithm provides similar (or better, in most cases) performance within significantly less run time compared to other state of the art methods.