The focus of this workshop is machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.
H2O is an open source, distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). Erin LeDell is a Statistician and Machine Learning Scientist at H2O.ai, the company that produces the open source machine learning platform, H2O. She is the author of a handful of machine learning related software packages, including the h2oEnsemble R package for ensemble learning with H2O. Her research focuses on ensemble machine learning, learning from imbalanced binary-outcome data, influence curve based variance estimation and statistical computing.
In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook. For additional examples, browse our complete API documentation, or check out our introductory walkthrough notebooks. Finally, you can find complete end-to-end examples in the notebooks-contrib repo.
FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure computing framework to support the federated AI ecosystem. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC). It supports federated learning architectures and secure computation of various machine learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning. You can ask questions and participate in the development discussion. For any frequently asked questions, you can check in FAQ.
Kids love to play in physical sandboxes. Developers love to "play" in virtual sandboxes. BlueData, which offers a new-gen big-data-as-a-service (BDaaS) software platform, has made available a new environment for AI and machine-learning developers to try out new ideas and have fun testing them. This is a new turnkey package that enables accelerated deployment of artificial intelligence, machine learning and deep learning applications in the enterprise. Turns out you can't build these applications too quickly.