Goto

Collaborating Authors

Data Science: Machine Learning and Statistical Modeling in R

@machinelearnbot

In this course, we will teach you advanced techniques in machine learning with the latest code in R. Now is the time to take control of your data and start producing superior statistical analysis with R. You will delve into statistical learning theory and supervised learning; design efficient algorithms; learn about creating Recommendation Engines; use multi-class classification and deep learning and more. This course starts with teaching you how to set up the R environment, which includes installing RStudio and R packages. This course aims to excite you with awesome projects focused on analysis, visualization, and machine learning. You will explore, in depth, topics such as data mining, classification, clustering, regression, predictive modeling, anomaly detection, and more.


Intro to Machine Learning in H2O

#artificialintelligence

The focus of this workshop is machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.


Fast and Scalable Machine Learning in R and Python with H2O

#artificialintelligence

The focus of this talk is scalable machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.


Diversifying Database Activity Monitoring with Bandits

arXiv.org Artificial Intelligence

Database activity monitoring (DAM) systems are commonly used by organizations to protect the organizational data, knowledge and intellectual properties. In order to protect organizations database DAM systems have two main roles, monitoring (documenting activity) and alerting to anomalous activity. Due to high-velocity streams and operating costs, such systems are restricted to examining only a sample of the activity. Current solutions use policies, manually crafted by experts, to decide which transactions to monitor and log. This limits the diversity of the data collected. Bandit algorithms, which use reward functions as the basis for optimization while adding diversity to the recommended set, have gained increased attention in recommendation systems for improving diversity. In this work, we redefine the data sampling problem as a special case of the multi-armed bandit (MAB) problem and present a novel algorithm, which combines expert knowledge with random exploration. We analyze the effect of diversity on coverage and downstream event detection tasks using a simulated dataset. In doing so, we find that adding diversity to the sampling using the bandit-based approach works well for this task and maximizing population coverage without decreasing the quality in terms of issuing alerts about events.


Deep Learning on the JVM - DZone Big Data

#artificialintelligence

DL4J is a pretty awesome open source project that works with Spark and Hadoop. Deep Learning 4J also works as a YARN app! It includes Text, NLP, Canova Vectorization Lib for ML, Scientific computing for the JVM, distributed with clusters, and works with CUDA GPU kernels. DL4J is used for anomaly detection (fraud detection), recommender systems, predictive analytics with logs and image recognition. In a related open source project, Skymind built a numerical computing library ND4J, or n-dimensional arrays for Java, essentially porting Numpy to the JVM.