Best Apache hadoop & big data Training in Jalandhar & Best Apache hadoop & big data Industrial Training in Jalandhar

apache hadoop & big data

Introduction to BigData

  1. Which data is called as BigData
  2. What are business use cases for BigData
  3. BigData requirement for traditional Data warehousing and BI space
  4. BigData solutions

Introduction to Hadoop

  1. The amount of data processing in today’s life
  2. What Hadoop is why it is important
  3. Hadoop comparison with traditional systems
  4. Hadoop history
  5. Hadoop main components and architecture

Hadoop Distributed File System (HDFS)

  1. HDFS overview and design
  2. HDFS architecture
  3. HDFS file storage
  4. Component failures and recoveries
  5. Block placement
  6. Balancing the Hadoop cluster

Hadoop Deployment

  1. Different Hadoop deployment types
  2. Hadoop distribution options
  3. Hadoop competitors
  4. Hadoop installation procedure
  5. Distributed cluster architecture
  6. Lab: Hadoop Installation

Working with HDFS

  1. Ways of accessing data in HDFS
  2. Common HDFS operations and commands
  3. Different HDFS commands
  4. Internals of a file read in HDFS
  5. Data copying with ‘distcp’
  6. Lab: Working with HDFS

Hadoop Cluster Configuration

  1. Hadoop configuration overview and important configuration file
  2. Configuration parameters and values
  3. HDFS parameters
  4. MapReduce parameters
  5. Hadoop environment setup
  6. ‘Include’ and ‘Exclude’ configuration files
  7. Lab: MapReduce Performance Tuning

Hadoop Administration and Maintenance

  1. Namenode/Datanode directory structures and files
  2. Filesystem image and Edit log
  3. The Checkpoint Procedure
  4. Namenode failure and recovery procedure
  5. Safe Mode
  6. Metadata and Data backup
  7. Potential problems and solutions / What to look for
  8. Adding and removing nodes
  9. Lab: MapReduce Filesystem Recovery

Job Scheduling

  1. How to schedule Hadoop Jobs on the same cluster
  2. Default Hadoop FIFO Schedule
  3. Fair Scheduler and its configuration

Map-Reduce Abstraction

  1. What MapReduce is and why it is popular
  2. The Big Picture of the MapReduce
  3. MapReduce process and terminology
  4. MapReduce components failures and recoveries
  5. Working with MapReduce
  6. Lab: Working with MapReduce

Programming MapReduce Jobs

  1. Java MapReduce implementation
  2. Map() and Reduce() methods
  3. Java MapReduce calling code
  4. Lab: Programming Word Count

Input/Output Formats and Conversion Between Different Formats

  1. Default Input and Output formats
  2. Sequence File structure
  3. Sequence File Input and Output formats
  4. Sequence File access via Java API and HDS
  5. MapFile
  6. Lab: Input Format
  7. Lab: Format Conversion

MapReduce Features

  1. Joining Data Sets in MapReduce Jobs
  2. How to write a Map-Side Join
  3. How to write a Reduce-Side Join
  4. MapReduce Counters
  5. Built-in and user-defined counters
  6. Retrieving MapReduce counters
  7. Lab: Map-Side Join
  8. Lab: Reduce-Side Join

Introduction to Hive, Hbase, Flume, Sqoop, Oozie and Pig

  1. Hive as a data warehouse infrastructure
  2. Hbase as the Hadoop Database
  3. Using Pig as a scripting language for Hadoop

Hadoop Case studies

  1. How different organizations use Hadoop cluster in their infrastructure