Cloudera Administration for Apache Hadoop

Course Description

This five-days training provides System Administrators a comprehensive understanding of all the steps necessary to operate and manage Hadoop clusters. From installation and configuration, through load balancing and tuning your cluster, Clouderas Administration course has you covered.

Course Code: RRH1; Category: Big Data;

Hardware: VMware ESXi5, IBM X3250M3

Audience:
This course is intended for information technology professionals appropriate for system administrators who will be setting up or maintaining a Hadoop Cluster

Prerequisites:

Prior to attending this course, one should have knowledge and skills associated with the following:
Basic UNIX (Linux) or Windows system administration experience is a prerequisite for this training session.

Course Outline

An Introduction to Hadoop and HDFS

  • Why Hadoop?
  • HDFS
  • MapReduce
  • Hive, Pig, HBase and other ecosystem projects
  • Hands-On Exercise: Installing a pseudo-distributed cluster

Planning Your Hadoop Cluster

  • General Planning Considerations
  • Choosing the Right Hardware
  • Node Topologies
  • Choosing the Right Software,

Deploying Your Cluster

  • Installing Hadoop
  • Using SCM Express for easy installation
  • Typical Configuration Parameters
  • Configuring Rack Awareness
  • Using Configuration Management Tools&
  • Hands-On Exercise: Installing a Hadoop Cluster

Cluster Maintenance

  • Checking HDFS with fsck
  • Hands-On Exercise: Breaking the Cluster
  • Copying data with distcp
  • Rebalancing cluster nodes
  • Adding and removing cluster nodes
  • Hands-On Exercise: Verifying the Cluster's Self-Healing Features,/li>
  • Backup And Restore
  • Upgrading and Migrating
  • Hands-On Exercise: Backing Up and Restoring the NameNode Metadata

Managing and Scheduling Jobs

  • Starting and stopping MapReduce jobs
  • Hands-On Exercise: Managing jobsb
  • The FIFO Scheduler
  • The Fair Scheduler
  • Hands-On Exercise: Using the FairScheduler

Installing and Managing Other Hadoop Projects

  • Hive
  • Pig
  • HBase
  • Hands-On Exercise: Configuring the Hive Shared Metastore

Populating HDFS from External Sources

  • Using Sqoop
  • Using Flume
  • Best Practices for Data Ingestion

Cluster Monitoring, Troubleshooting and Optimizing

  • Hadoop Log Files
  • Using the NameNode and JobTracker Web UIs
  • Interpreting Job Logs
  • Monitoring with Ganglia
  • Other monitoring tools
  • General Optimization Tips,
  • Benchmarking Your Cluster