Components: Hadoop’s HDFS

Description: One of the most important part of Hadoop technology stack - distributed filesystem. Used as main storage for processed data and as source for different ML task. Used also as main storage for Hive warehouse

Provision script (Puppet manifest): hadoop.pp

Additional info: Hadoop site

Table of content:

Home
Prerequisites
Developement stand provisioning
Components
Monitoring Links
Development