Collaborating Authors

Real World Hadoop - Upgrade Cloudera and Hadoop hands on


Note: This course is built on top of the "Real World Vagrant - Automate a Cloudera Manager Build - Toyin Akin" course Upgrading Cloudera Manager enables new features of the latest product versions while preserving existing data and settings. Some new settings are added, and some additional steps may be required, but no existing configuration is removed. Upgrading Cloudera Manager The process for upgrading Cloudera Manager varies depending on the starting point. Install databases required for the release. In Cloudera Manager 5, the Host Monitor and Service Monitor roles use an internal database that provides greater capacity and flexibility.

Real World Vagrant - Build an Apache Spark Development Env!


Note: This course is built on top of the "Real World Vagrant For Distributed Computing - Toyin Akin" course This course enables you to package a complete Spark Development environment into your own custom 2.3GB vagrant box.

Get started with Hadoop and Spark in 10 minutes


With the big 3 Hadoop vendors – Cloudera, Hortonworks and MapR - each providing their own Hadoop sandbox virtual machines (VMs), trying out Hadoop today has become extremely easy. For a developer, it is extremely useful to download and get started with one of these VMs and try out Hadoop to practice data science right away. However, with the core Apache Hadoop, these vendors package their own software into their distributions, mostly for the orchestration and management, which can be a pain due to the multiple scattered open-source projects within the Hadoop ecosystem. Hortonworks includes the open-source Ambari while Cloudera includes its own Cloudera Manager for orchestrating Hadoop installations and managing multi-node clusters. Moreover, most of these distributions require today a 64-bit machine and sometimes a high-amount of memory (for a laptop).

Kali Linux for Vagrant: Hands-on


I recently saw the announcement for Kali Linux on Vagrant. I have been a huge fan of Kali Linux for a very long time, and I am interested in virtualization (and currently using VirtualBox in an educational environment), so this was a very interesting combination to me. I have now installed it on a few of my systems, and so far I am quite impressed with it. The Internet of Things is the new frontier. However, generations of ERP systems were not designed to handle global networks of sensors and devices.

Real World Hadoop - Hands on Enterprise Distributed Storage.


The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. We will be manipulating the HDFS File System, however why are Enterprises interested in HDFS to begin with? However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.