Goto

Collaborating Authors

Demystifying Information Security Using Data Science

#artificialintelligence

When you search for security data science on the internet, it's difficult to find resources with crisp and clear information about the use cases, methods and limitations in Information Security (hereby referred to as InfoSec). There's usually always some marketing material attached to it. So, I thought of summarising my knowledge and InfoSec experience in this article.


Intro to Machine Learning in H2O

#artificialintelligence

The focus of this workshop is machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.


Fast and Scalable Machine Learning in R and Python with H2O

#artificialintelligence

The focus of this talk is scalable machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.


Deep Learning on the JVM - DZone Big Data

#artificialintelligence

DL4J is a pretty awesome open source project that works with Spark and Hadoop. Deep Learning 4J also works as a YARN app! It includes Text, NLP, Canova Vectorization Lib for ML, Scientific computing for the JVM, distributed with clusters, and works with CUDA GPU kernels. DL4J is used for anomaly detection (fraud detection), recommender systems, predictive analytics with logs and image recognition. In a related open source project, Skymind built a numerical computing library ND4J, or n-dimensional arrays for Java, essentially porting Numpy to the JVM.


Anomaly Detection in Telecommunications Using Complex Streaming Data Whiteboard Walkthrough

@machinelearnbot

The telecommunications industry is on the verge of a major transformation through the use of advanced analytics and big data technologies like the MapR Converged Data Platform. The MapR Guide to Big Data in Telecommunications is designed to help you understand the trends and technologies behind this data driven telecommunications revolution. In this week's Whiteboard Walkthrough Ted Dunning, Chief Application Architect at MapR, explains in detail how to use streaming IoT sensor data from handsets and devices as well as cell tower data to detect strange anomalies. He takes us from best practices for data architecture, including the advantages of multi-master writes with MapR Streams, through analysis of the telecom data using clustering methods to discover normal and anomalous behaviors. I'd like to talk a little bit about data processing in the context of telecom.