Big Data/Hadoop Course Content
This course has been targeted for Architects, Administrators and developers
Attend once and fit yourself to any role as you wish !
This course has been targeted for Architects, Administrators and developers
Attend once and fit yourself to any role as you wish !
Module 1
Big data Getting Started
|
What is Big Data?
What is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
|
Module 2
Hadoop Distributed File system
|
Eclipse Installation
Overview of HDFS
Communication Protocols
Rack Awareness
Hadoop cluster Topology
Setting up SSH for Hadoop Cluster
Running Hadoop –
1.
Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
|
Module 3
MapReduce Framework
|
Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
|
Module 4
Advanced MapReduce Programming
|
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce,
OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
|
Module 5
Apache Hadoop Administration
|
Best Practices for Hadoop setup and infrastructure
Hadoop cluster Installation preparation
Ø Cluster
network design
Ø Installation
of Linux operating system
Ø Configuring
SSH
Ø Walkthrough
on Rack topology and set up
Managing Hadoop cluster
Ø HDFS
cluster management
Ø Secondary
Name node configuration
Ø Task
Tracker management
Ø Configuring
the HDFS quota
Ø Configuring
Fair Scheduler
Ø Upgrading
Hadoop
Ø Deploying
and managing Hadoop clusters
with Ambari
Monitoring Hadoop cluster
Ø Monitoring
Hadoop cluster with Ganglia
Ø Monitoring
Hadoop cluster with Ambari
Ø Monitoring
Hadoop cluster with Nagia
Hadoop Cluster Performance Tuning
Ø Benchmarking
and profiling
Ø Using
compression for input and output
Ø Configuring
optimal map and reduce
slots for the TT
Ø Fine
tuning Job Tracker config
Ø Fine
tuning Task Tracker config
Ø Tuning
Shuffle, merge and sort parameters
Security Implementation
Kerberos security
Implementation
Workflow Scheduler
Capacity
Scheduler
Fair
Scheduler
dfsadmin & mradmin commands
Administration of Hcatalog and Hive
Backup and Recovery
Scenario based exercises
-
Data node failure & Recovery
-
Name Node Failure & Recovery
-
JT & TT failure & Recovery
-
Removing data nodes
-
Adding Data nodes
-
Commissioning and decommissioning of nodes
|
Module 6
Pig and Pig Latin
|
Installation and configuration
Running Pig Lating through grunt
Writing programs
-
Filter , Load & Store functions
Writing user defined functions
Working with Scripts
Lab Exercises
|
Module 7
HBase and ZooKeeper
|
NoSQL Vs SQL
Cap Theorem
Architecture
Installation
Configuration
Java API
MR integration
Performance Tuning
Lab Exercises
|
Module 8
Hive
|
Features of Hive
Architecture
Installation and configuration
HiveQL
Lab Exercises
|
Module 9
Other Hadoop eco system components
|
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume
Lab Exercises
|
Module 10
Hadoop on Cloud
|
Hosting Hadoop on Amazon EC2
EMR Hands-on
|
http://big-data-training-in-chennai.blogspot.in/
No comments:
Post a Comment