Cloudera Developer Training for Apache Hadoop

Course Description & Details

This training programme from Cloudera is for developers who want to learn to use Apache Hadoop to build powerful data processing applications.

Course Code: RRH2;  Category: Big Data;

Hardware: VMware ESXi5, IBM X3250M3

This course is intended and appropriate for developers who will be writing, maintaining, or optimizing Hadoop jobs.

Participants should have programming experience, preferably with Java. Understanding of common computer science concepts is a plus.

Course Outline

The Motivation for Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

    Hadoop: Basic Concepts

  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands-On Exercise
  • How MapReduce Works
  • Hands-On Exercise
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

    Writing a MapReduce Program

  • The MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoops Streaming API
  • Using Eclipse for Rapid Development
  • Hands-on exercise

    Integrating Hadoop into the Workflow

  • Relational Database Management System
  • Storage Systems
  • Importing Data from RDBMS With Sqoop
  • Hands-On Exercise
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using FuseDFS and Hoop

    More Advanced MapReduce Programming

  • Custom Writable and Writable Comparables
  • Saving Binary Data using Sequence Files and Avro Files
  • Creating InputFormats and Out put Formats
  • Hands-on exercise

    Graph Manipulation in Hadoop

  • Introduction to graph techniques Representing graphs in Hadoop Implementing a sample algorithm:Single Source Shortest Path

    Using Hive and Pig

  • Hive Basics Pig Basics Hands-on exercise

    Delving Deeper Into the Hadoop API

  • Using LocalJobRunner Mode for Faster Development Reducing Intermediate Data With Combiners The configure and close methods for Map/Reduce Setup and Teardown Writing Partitioners for Better Load Balancing Directly Accessing HDFS Using the Distributed Cache Hands-On Exercise

    Practical Development Tips and Techniques

  • Testing with MRUnit Debugging MapReduce Code Using LocalJobRunner Mode. MapReduce Jobs Implementing Multiple Mappers using ChainMapper Hands-On Exercises

    Common MapReduce Algorithms

  • Sorting and Searching Indexing Machine Learning With Mahout Term Frequency-Inverse Document Frequency Word Co-Occurrence Hands-On Exercise

    Joining Data Sets in MapReduce Jobs

  • Map-Side Joins The Secondary Sort Reduce-Side Joins Hands-On Exercise

    Creating Workflows with Oozie

  • The Motivation for Oozie's Workflow Definition Format Hands-On Exercise