Goto

Collaborating Authors

Demystifying Information Security Using Data Science

#artificialintelligence

When you search for security data science on the internet, it's difficult to find resources with crisp and clear information about the use cases, methods and limitations in Information Security (hereby referred to as InfoSec). There's usually always some marketing material attached to it. So, I thought of summarising my knowledge and InfoSec experience in this article.


Fast and Scalable Machine Learning in R and Python with H2O

#artificialintelligence

The focus of this talk is scalable machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.


Intro to Machine Learning in H2O

#artificialintelligence

The focus of this workshop is machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.


Call Detail Records Driven Anomaly Detection and Traffic Prediction in Mobile Cellular Networks

arXiv.org Artificial Intelligence

Mobile networks possess information about the users as well as the network. Such information is useful for making the network end-to-end visible and intelligent. Big data analytics can efficiently analyze user and network information, unearth meaningful insights with the help of machine learning tools. Utilizing big data analytics and machine learning, this work contributes in three ways. First, we utilize the call detail records (CDR) data to detect anomalies in the network. For authentication and verification of anomalies, we use k-means clustering, an unsupervised machine learning algorithm. Through effective detection of anomalies, we can proceed to suitable design for resource distribution as well as fault detection and avoidance. Second, we prepare anomaly-free data by removing anomalous activities and train a neural network model. By passing anomaly and anomaly-free data through this model, we observe the effect of anomalous activities in training of the model and also observe mean square error of anomaly and anomaly free data. Lastly, we use an autoregressive integrated moving average (ARIMA) model to predict future traffic for a user. Through simple visualization, we show that anomaly free data better generalizes the learning models and performs better on prediction task.


Anomaly Detection in Telecommunications Using Complex Streaming Data Whiteboard Walkthrough

@machinelearnbot

The telecommunications industry is on the verge of a major transformation through the use of advanced analytics and big data technologies like the MapR Converged Data Platform. The MapR Guide to Big Data in Telecommunications is designed to help you understand the trends and technologies behind this data driven telecommunications revolution. In this week's Whiteboard Walkthrough Ted Dunning, Chief Application Architect at MapR, explains in detail how to use streaming IoT sensor data from handsets and devices as well as cell tower data to detect strange anomalies. He takes us from best practices for data architecture, including the advantages of multi-master writes with MapR Streams, through analysis of the telecom data using clustering methods to discover normal and anomalous behaviors. I'd like to talk a little bit about data processing in the context of telecom.