Collaborating Authors

Call Detail Records Driven Anomaly Detection and Traffic Prediction in Mobile Cellular Networks Artificial Intelligence

Mobile networks possess information about the users as well as the network. Such information is useful for making the network end-to-end visible and intelligent. Big data analytics can efficiently analyze user and network information, unearth meaningful insights with the help of machine learning tools. Utilizing big data analytics and machine learning, this work contributes in three ways. First, we utilize the call detail records (CDR) data to detect anomalies in the network. For authentication and verification of anomalies, we use k-means clustering, an unsupervised machine learning algorithm. Through effective detection of anomalies, we can proceed to suitable design for resource distribution as well as fault detection and avoidance. Second, we prepare anomaly-free data by removing anomalous activities and train a neural network model. By passing anomaly and anomaly-free data through this model, we observe the effect of anomalous activities in training of the model and also observe mean square error of anomaly and anomaly free data. Lastly, we use an autoregressive integrated moving average (ARIMA) model to predict future traffic for a user. Through simple visualization, we show that anomaly free data better generalizes the learning models and performs better on prediction task.

The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development Machine Learning

As machine learning is applied more and more widely, data scientists often struggle to find or create end-to-end machine learning systems for specific tasks. The proliferation of libraries and frameworks and the complexity of the tasks have led to the emergence of "pipeline jungles" -- brittle, ad hoc ML systems. To address these problems, we introduce the Machine Learning Bazaar, a new approach to developing machine learning and AutoML software systems. First, we introduce ML primitives, a unified API and specification for data processing and ML components from different software libraries. Next, we compose primitives into usable ML programs, abstracting away glue code, data flow, and data storage. We further pair these programs with a hierarchy of search strategies -- Bayesian optimization and bandit learning. Finally, we create and describe a general-purpose, multi-task, end-to-end AutoML system that provides solutions to a variety of ML problem types (classification, regression, anomaly detection, graph matching, etc.) and data modalities (image, text, graph, tabular, relational, etc.). We both evaluate our approach on a curated collection of 431 real-world ML tasks and search millions of pipelines, and also demonstrate real-world use cases and case studies.

Wavelet-based Temporal Forecasting Models of Human Activities for Anomaly Detection Artificial Intelligence

This paper presents a novel approach for temporal modelling of long-term human activities based on wavelet transforms. The model is applied to binary smart-home sensors to forecast their signals, which are used then as temporal priors to infer anomalies in office and Active & Assisted Living (AAL) scenarios. Such inference is performed by a new extension of Hybrid Markov Logic Networks (HMLNs) that merges different anomaly indicators, including activity levels detected by sensors, expert rules and the new temporal models. The latter in particular allow the inference system to discover deviations from long-term activity patterns, which cannot by detected by simpler frequency-based models. Two new publicly available datasets were collected using several smart-sensors to evaluate the wavelet-based temporal models and their application to signal forecasting and anomaly detection. The experimental results show the effectiveness of the proposed techniques and their successful application to detect unexpected activities in office and AAL settings.

Cybersecurity Data Science: Minding the Growing Gap - DATAVERSITY


Click to learn more about author Scott Mongeau. Following cybersecurity Data Science best practices can help beleaguered and resource-strapped security teams transform Big Data into smart data for better anomaly detection and enterprise protection. The consequences of ignoring security challenges are rising. According to the Cisco 2018 Annual Cybersecurity Report, over half of cyberattacks resulted in damages of greater than $500K, with nearly 20 percent costing more than $2.5M. Meanwhile regulators, seeking to spur heightened oversight, have become more aggressive in levying fines and holding corporate boards accountable.