This paper presents a novel approach for temporal modelling of long-term human activities based on wavelet transforms. The model is applied to binary smart-home sensors to forecast their signals, which are used then as temporal priors to infer anomalies in office and Active & Assisted Living (AAL) scenarios. Such inference is performed by a new extension of Hybrid Markov Logic Networks (HMLNs) that merges different anomaly indicators, including activity levels detected by sensors, expert rules and the new temporal models. The latter in particular allow the inference system to discover deviations from long-term activity patterns, which cannot by detected by simpler frequency-based models. Two new publicly available datasets were collected using several smart-sensors to evaluate the wavelet-based temporal models and their application to signal forecasting and anomaly detection. The experimental results show the effectiveness of the proposed techniques and their successful application to detect unexpected activities in office and AAL settings.
Outdated, inaccurate, or duplicated data won't drive optimal data driven solutions. When data is inaccurate, leads are harder to track and nurture, and insights may be flawed. The data on which you base your big data strategy must be accurate, up-to-date, as complete as possible, and should not contain duplicate entries. Cleaning data is the most time-consuming and least enjoyable data science task (until Optimus), but one of the most important ones. No one can start a data science, machine learning or data driven solution without being sure that the data that they'll be consuming is at its optimal stage.
In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets harvested via these means, sometimes resulting in entire classes of data on which learned classifiers generalize poorly. For real world applications, we argue that it can be beneficial to avoid training on such classes entirely. In this work, we aim to explore the classes in a given data set, and guide supervised training to spend time on a class proportional to its learnability. By focusing the training process, we aim to improve model generalization on classes with a strong signal. To that end, we develop an online algorithm that works in conjunction with classifier and training algorithm, iteratively selecting training data for the classifier based on how well it appears to generalize on each class. Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with good performance on learnable classes.
"Data wrangling" was an interesting phrase to hear in the machine learning (ML) presentations at Microsoft Ignite. Interesting because data wrangling is from business intelligence (BI), not from artificial intelligence (AI). Microsoft understands ML incorporates concepts from both disciplines. Further discussions point to another key point: Microsoft understands that business-to-business (B2B) is just as fertile for ML as business-to-consumer (B2C). ML applications with the most press are voice, augmented reality and autonomous vehicles.