This paper presents a novel approach for temporal modelling of long-term human activities based on wavelet transforms. The model is applied to binary smart-home sensors to forecast their signals, which are used then as temporal priors to infer anomalies in office and Active & Assisted Living (AAL) scenarios. Such inference is performed by a new extension of Hybrid Markov Logic Networks (HMLNs) that merges different anomaly indicators, including activity levels detected by sensors, expert rules and the new temporal models. The latter in particular allow the inference system to discover deviations from long-term activity patterns, which cannot by detected by simpler frequency-based models. Two new publicly available datasets were collected using several smart-sensors to evaluate the wavelet-based temporal models and their application to signal forecasting and anomaly detection. The experimental results show the effectiveness of the proposed techniques and their successful application to detect unexpected activities in office and AAL settings.
In many domains, collecting sufficient labeled training data for supervised machine learning requires easily accessible but noisy sources, such as crowdsourcing services or tagged Web data. Noisy labels occur frequently in data sets harvested via these means, sometimes resulting in entire classes of data on which learned classifiers generalize poorly. For real world applications, we argue that it can be beneficial to avoid training on such classes entirely. In this work, we aim to explore the classes in a given data set, and guide supervised training to spend time on a class proportional to its learnability. By focusing the training process, we aim to improve model generalization on classes with a strong signal. To that end, we develop an online algorithm that works in conjunction with classifier and training algorithm, iteratively selecting training data for the classifier based on how well it appears to generalize on each class. Testing our approach on a variety of data sets, we show our algorithm learns to focus on classes for which the model has low generalization error relative to strong baselines, yielding a classifier with good performance on learnable classes.
Why do so many companies still struggle to build a smooth-running pipeline from data to insights? They invest in heavily hyped machine-learning algorithms to analyze data and make business predictions. Then, inevitably, they realize that algorithms aren't magic; if they're fed junk data, their insights won't be stellar. So they employ data scientists that spend 90% of their time washing and folding in a data-cleaning laundromat, leaving just 10% of their time to do the job for which they were hired. What is flawed about this process is that companies only get excited about machine learning for end-of-the-line algorithms; they should apply machine learning just as liberally in the early cleansing stages instead of relying on people to grapple with gargantuan data sets, according to Andy Palmer, co-founder and chief executive officer of Tamr Inc., which helps organizations use machine learning to unify their data silos.
Good data management practices are essential for ensuring that research data are of high quality, findable, accessible and have high validity. You can then share data ensuring their sustainability and accessibility in the long-term, for new research and policy or to replicate and validate existing research and policy. It is important that researchers extend these practices to their work with all types of data, be it big (large or complex) data or smaller, more'curatable' datasets. In this blog, we are going to understand about the data curation. Furthermore, we will be looking into many other advantages which data curation will bring to the big data table.