Day 1:
Introduction
to Big Data.
Realtime
usages
Volume
,Variety,Velocity,Value
Compare
with existing OLTP,ETL,DWH,OLAP
Day 2
Introduction
to Hadoop 1.0 and Hadoop 2.0
Architecture
HDFS
Cluster – Data Storage Framework
Map
Reduce - Data Processing Framework
HBASE –
NOSQL Database
HIVE
Warehouse
PIG latin Data flow scripts
SQOOP –Bulk
data transfer for relational database
Flume -Streaming Logs
DAY 3
Setup
-VM Linux /ubuntu/CentOS
Java
Hadoop
setup and configuration –version 1.1.2 and 2.05
Hadoop
1.0 cluster and Daemons
Name node
– Metadata , fsimage ,Editlog , Block reports
Rack
awareness policy
Safe
mode ,rebalancing and load optimization
Data
node – Writing, reading and replication of blocks
Job
tracker – Intialization, Execution, IO, failure
Task
tracker – Initialization , progress, failure
Secondary
Namenode – Not a backup
DAY 4
Installation
and config of Hadoop 2.0 –YARN
Resource
Manager – resource and job Management
Application
Manager
Scheduler - Fair ,Capacity ,Priority
Node
Manager
Application
Master
Container
– Yarn Child and task execution
UBER
job
Failure
of Application ,RM,AM,NM
Day 5:
Unix
and Java Basics.
HDFS file
operations fs shell
Day 6:
Introduction
to Mapreduce.
Architecture
of MR v1 and v2
Key
Value Pairs
Mapper
– setup/Config,init,map,cleanup,close
Shuffle
and Sort
Combiner
Pratitioner
Reducer
Day 7:
Map
reduce word count program.
Structured
and Unstructured Data handling
Data
processing
Map
only jobs
Day 8
and Day 9
MR
Programs 2:
Combiner
and Partitioner
Single
and multiple column
Inverted
index
XML
-semi structured data
Map
side joins.
Reduce
side join.
Day 10
Introduction
to HIVE Datawarehouse
Architecture
Installation
Basic
HQL Commands
Load, external
table
Join
Partioning
Bucket
Advance
HQL commands
Beeswax
–Web console
Word
count in hive
Day 11:
Introduction
to PIG
Installation
Data
flow Scripts
Handling
structured and unstructured
Day 12:
Introduction
to NOSQL
ACID
/CAP/BASE
Key
value pair -Map reduce
Column
family-Hbase
Document
-MongoDB
Graph
DB -Neo4j
Day 13:
Introduction
to HBASE and installation.
The HBase Data
Model
The HBase Shell
HBase Architecture
Schema Design
The HBase API
HBase
Configuration and Tuning
Day 14:
Introduction
to Sqoop and installation.
Bulk
loading
Hadoop
Streaming.
Day 15:
Flume –NG
Source,Sink,Channel
–Agent
Avro
Zoo
keeper
chukwa
and oozie
Day 16:
Integrate
With ETL
Talend
Data studio
Day 17
:
Big
data Analytics-Visualization
Tableau
or Jaspersoft
Cloudera
/Hortonworks/Greenplum
Day 18:
Introduction
to Data science
Data
mining -Machine learning
Statistical
Analysis –Predictive modelling
Sentiment
Analysis or opinion mining
Day 19
:
Use
cases ,Case studies and Proof of Concepts
Day 20 and Day 21(Optional)
CCD-410
- Cloudera Certification Questions Discussion.