Cloudera Developer Training for Apache Hadoop

Course Description & Details

This training programme from Cloudera is for developers who want to learn to use Apache Hadoop to build powerful data processing applications.

Course Code: RRH2;  Category: Big Data;

Hardware: VMware ESXi5, IBM X3250M3

Audience:
This course is intended and appropriate for developers who will be writing, maintaining, or optimizing Hadoop jobs.

Prerequisites:
Participants should have programming experience, preferably with Java. Understanding of common computer science concepts is a plus.

Course Outline

The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

    Hadoop: Basic Concepts

  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands-On Exercise
  • How MapReduce Works
  • Hands-On Exercise
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

    Writing a MapReduce Program

  • The MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoops Streaming API
  • Using Eclipse for Rapid Development
  • Hands-on exercise

    Integrating Hadoop into the Workflow

  • Relational Database Management System
  • Storage Systems
  • Importing Data from RDBMS With Sqoop
  • Hands-On Exercise
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using FuseDFS and Hoop

    More Advanced MapReduce Programming

  • Custom Writable and Writable Comparables
  • Saving Binary Data using Sequence Files and Avro Files
  • Creating InputFormats and Out put Formats
  • Hands-on exercise

    Graph Manipulation in Hadoop

  • Introduction to graph techniques Representing graphs in Hadoop Implementing a sample algorithm:Single Source Shortest Path

    Using Hive and Pig

  • Hive Basics Pig Basics Hands-on exercise

    Delving Deeper Into the Hadoop API

  • Using LocalJobRunner Mode for Faster Development Reducing Intermediate Data With Combiners The configure and close methods for Map/Reduce Setup and Teardown Writing Partitioners for Better Load Balancing Directly Accessing HDFS Using the Distributed Cache Hands-On Exercise

    Practical Development Tips and Techniques

  • Testing with MRUnit Debugging MapReduce Code Using LocalJobRunner Mode. MapReduce Jobs Implementing Multiple Mappers using ChainMapper Hands-On Exercises

    Common MapReduce Algorithms

  • Sorting and Searching Indexing Machine Learning With Mahout Term Frequency-Inverse Document Frequency Word Co-Occurrence Hands-On Exercise

    Joining Data Sets in MapReduce Jobs

  • Map-Side Joins The Secondary Sort Reduce-Side Joins Hands-On Exercise

    Creating Workflows with Oozie

  • The Motivation for Oozie's Workflow Definition Format Hands-On Exercise