Best bigdata training In Chennai: February 2013

Data science Course curriculum

For more details Course curriculum ,duration ,fees > Click here
Call 9884218531 for Demo

Data science training in Chennai

Geoinsyssoft

We live in a data-driven world. Increasingly, the efficient operation of organizations across sectors relies on the effective use of vast amounts of data. Making sense of big data is a combination of organizations having the tools, skills and more importantly, the mindset to see data as the new "oil" fueling a company. Unfortunately, the technology has evolved faster than the workforce skills to make sense of it and organizations across sectors must adapt to this new reality or perish." -Data Scientist

* Data modeling: relations, key-value, trees, graphs, images, text
* Relational algebra and parallel query processing
* NoSQL systems, key-value stores
* Tradeoffs of SQL, NoSQL, and NewSQL systems
* Algorithm design in Hadoop (and MapReduce in general)
* Basic statistical analysis at scale: sampling, regression
* Introduction to data mining: clustering, association rules, decision trees
* Case studies in analytics: social networking, bioinformatics, text processing

Statistics

Big data Hadoop

Big data analytics

Machine learning

data mining

www.geoinsyssoft.com

Statistics Course curriculum :

Variables and Graphs

Variables

Graphs for Categorical Variables

Stemplots

Numerical Summary of Data

Mean as center of the distribution

Median as center of the distribution

Spread using quartiles

Spread using the interquantity range

Boxplots

Standard deviation as spread

Normal Distribution

Normal distributions

Standardized ( Z- ) observations

Standard normal curve Calculations with the normal

Normal quantile or probability plots

Scatter plots

Interpreting scatter plots

Least Square Regression

fitting a line to data

prediction and residuals

Least square regression

Correlation

correlation r

r squared - coefficient of determination

examples

Categorical Data

describing relationships

conditional distributions

Simpsons paradox

Causation

Producing Data

Design of Experiments

Randomization

How to Randomize

Simple Random Sampling

Statistical Inference Foundations

Sampling Variability

Sampling Distribution

Variability of a statistic

Probability

Randomness

Language of Probability

Uses of Probability

Probability values

Inference

Sampling distribution for Proportions and Counts

Sample proportions

Normal Approximation

Sampling distribution for the Mean

Mean and Standard Deviation

Sampling Distribution Central Limit Theorm

Confidence and Significance

Confidence Intervals

Confidence Intervals in General

Confidence Internal for a population mean

How confidence Internals Behave

Tests of Significance

Reasoning of the Test

Hypothesis

Test Statistic

P-values

Tests for a Population Mean

Fixed Level Tests

Statistical Significance vs. Practical Significance

Inference for Distributions

Inference for the Mean of a Population

t distribution

one sample t test

one sample confidence interval

Matched Pairs t test Comparing two Means

two sample -Z test

two sample t test

two sample confidence internal

Comparing Spread of Distributions

Inference for Spread the Chi-Square Distribution

F Test for Equality of Spread in Two Populations

Simple Linear Regression

Model for Regression

Estimating Parameters

Confidence Internals

Significance tests

ANOVA for Regression

The ANOVA table

The F test

Inference for the Slope

Phase 2:

www.geoinsyssoft.com

Big data Hadoop

Introduction and Overview of Hadoop

Introduction to Bigdata
Architecture of BD
3V Concept
FRamework and Applications/Tools
Samples

Introduction and Overview of Hadoop

What is Hadoop?
History of Hadoop
Building Blocks - Hadoop Eco-System
Who is behind Hadoop?
What Hadoop is good for and what it is not

Hadoop Distributed FileSystem (HDFS)

HDFS Overview and Architecture
HDFS Installation
HDFS Use Cases
Hadoop FileSystem Shell
FileSystem Java API
Hadoop Configuration

HBase - The Hadoop Database

HBase Overview and Architecture
HBase Installation
HBase Shell
Java Client API
Java Administrative API
Filters
Scan Caching and Batching
Key Design
Table Design

Map/Reduce 2.0

MapReduce 2.0
MapReduce 2.0 Architecture
Installation
YARN and MapReduce Command Line Tools
Developing MapReduce Jobs
Input and Output Formats
HDFS and HBase as Source and Sink
Job Configuration
Job Submission and Monitoring
Anatomy of Mappers, Reducers, Combiners and Partitioners
Anatomy of Job Execution on YARN
Distributed Cache
Hadoop Streaming

MapReduce Workflows

Decomposing Problems into MapReduce Workflow
Using JobControl
Oozie Introduction and Architecture
Oozie Installation
Developing, deploying, and Executing Oozie Workflows

Pig

Pig Overview
Installation
Pig Latin
Developing Pig Scripts
Processing Big Data with Pig
Joining data-sets with Pig

Hive

Hive Overview
Installation
Hive QL

Sqoop

Sqoop Overview
Installation
HDFS-Sqoop-HIVE

www.geoinsyssoft.com

Big data analytics using R

Overview

History of R
Advantages and disadvantages
Downloading and installing
How to find documentation

Introduction

Using the R console
Getting help
Learning about the environment
Writing and executing scripts
Saving your work

Data Structures, Variables

Variables and assignment
Data types
Indexing, subsetting
Viewing data and summaries
Functions
Naming conventions
Objects
Models
Graphics

Control Flow

Truth testing
Branching
Looping
Vectorized calculations

Functions

Parameters
Return values
Variable scope
Exception Handling

Getting Data into the R environment

Builtin data
Reading local data
Web data

Overview of Statistics in R

Introduction to R Graphics
Model notation

Descriptive statistics

Continuous data

Scatter plot
Box plot

Categorical data

Mosaic plot

Correlation

Inferential statistics

T-test and non-parametric equivalents
Chi-squared test, logistic regression
Distribution testing
Power testing

Linear Regression

Linear models
Regression plots
ANOVA

Other Topics

Classification
Clustering
Time series
Dimensionality reduction
Machine Learning

Object Oriented R

Generic functions
S3/S4 classes

Installing Packages

Finding resources
Installing resources

More about Graphics

Labels
Exporting graphics

Sophisticated Graphics in R

Lattice
GGplot2
Interactive graphics
Animated GIF
rGGobi

R for Mapping and GIS

Choropleth maps
Layers

www.geoinsyssoft.com

Machine Learning using Octave/weka

Regressions

Decision models

Sentiment analysis

Text classification

Natural language processing

Probabilistic model

Graphical model

SaaS

Data science and Big data training Chennai

Email :

info@geoinsyssoft.com / geoinsys@gmail.com

Phone :

+91 44 43542263 / 43542262

Mobile :

+91 9884218531

Address :

#2, 4th Floor, Balaji Nagar ,
1st Main Road, Ekkaduthangal,
Chennai - 600032,
Landmark : Opp Virtusa IT Park/Behind Petrol bunk

For more details Course curriculum ,duration ,fees > Click here
Call 9884218531 for Demo

Big data Analytics with R training Chennai

For more details Course curriculum ,duration ,fees > Click here
Call 9884218531 for Demo

Big data Analytics training Course content:

To identify which statistical methodologies are adequate in order to tackle a given problem;
To explore, represent and summarize distinct data structures;
To infer population behaviors given available sample datasets;
To estimate and test the robustness of causal relationships.

Overview

History of R
Advantages and disadvantages
Downloading and installing
How to find documentation

Introduction

Using the R console
Getting help
Learning about the environment
Writing and executing scripts
Saving your work

Data Structures, Variables

Variables and assignment
Data types
Indexing, subsetting
Viewing data and summaries
Functions
Naming conventions
Objects
Models
Graphics

Control Flow

Truth testing
Branching
Looping
Vectorized calculations

Functions

Parameters
Return values
Variable scope
Exception Handling

Getting Data into the R environment

Builtin data
Reading local data
Web data

Overview of Statistics in R

Introduction to R Graphics
Model notation

Descriptive statistics

Continuous data

Scatter plot
Box plot

Categorical data

Mosaic plot

Correlation

Inferential statistics

T-test and non-parametric equivalents
Chi-squared test, logistic regression
Distribution testing
Power testing

Linear Regression

Linear models
Regression plots
ANOVA

Other Topics

Classification
Clustering
Time series
Dimensionality reduction
Machine Learning

Object Oriented R

Generic functions
S3/S4 classes

Installing Packages

Finding resources
Installing resources

More about Graphics

Labels
Exporting graphics

Sophisticated Graphics in R

Lattice
GGplot2
Interactive graphics
Animated GIF
rGGobi

R for Mapping and GIS

Choropleth maps
Layers

Email :

info@geoinsyssoft.com / geoinsys@gmail.com

Phone :

+91 44 43542263 / 43542262

Mobile :

+91 9884218531

Address :

#2, 4th Floor, Balaji Nagar ,
1st Main Road, Ekkaduthangal,
Chennai - 600032,
Landmark : Opp Virtusa IT Park/Behind Petrol bunk

For more details Course curriculum ,duration ,fees > Click here
Call 9884218531 for Demo

Big data Course curriculum of Geoinsyssoft

For more details Course curriculum ,duration ,fees > Click here
Call 9884218531 for Demo

Big data Hadoop Training Chennai at Geoinsyssoft.

BIG DATA:

Roles

Data Scientist
Hadoop Developer
Architect
Administrator
Consultant
Programmer
Business Analyst

Day 1:

Introduction to Big Data.

Realtime usages

Volume ,Variety,Velocity,Value

Compare with existing OLTP,ETL,DWH,OLAP

Day 2

Introduction to Hadoop 1.0 and Hadoop 2.0

Architecture

HDFS Cluster – Data Storage Framework

Map Reduce - Data Processing Framework

HBASE – NOSQL Database

HIVE Warehouse

PIG latin Data flow scripts

SQOOP –Bulk data transfer for relational database

Flume -Streaming Logs

DAY 3

Setup -VM Linux /ubuntu/CentOS

Java

Hadoop setup and configuration –version 1.1.2 and 2.05

Hadoop 1.0 cluster and Daemons

Name node – Metadata , fsimage ,Editlog , Block reports

Rack awareness policy

Safe mode ,rebalancing and load optimization

Data node – Writing, reading and replication of blocks

Job tracker – Intialization, Execution, IO, failure

Task tracker – Initialization , progress, failure

Secondary Namenode – Not a backup

DAY 4

Installation and config of Hadoop 2.0 –YARN

Resource Manager – resource and job Management

Application Manager

Scheduler - Fair ,Capacity ,Priority

Node Manager

Application Master

Container – Yarn Child and task execution

UBER job

Failure of Application ,RM,AM,NM

Day 5:

Unix and Java Basics.

HDFS file operations fs shell

Day 6:

Introduction to Mapreduce.

Architecture of MR v1 and v2

Key Value Pairs

Mapper – setup/Config,init,map,cleanup,close

Shuffle and Sort

Combiner

Pratitioner

Reducer

Day 7:

Map reduce word count program.

Structured and Unstructured Data handling

Data processing

Map only jobs

Day 8 and Day 9

MR Programs 2:

Combiner and Partitioner

Single and multiple column

Inverted index

XML -semi structured data

Map side joins.

Reduce side join.

Day 10

Introduction to HIVE Datawarehouse

Architecture Installation

Basic HQL Commands

Load, external table

Join

Partioning

Bucket

Advance HQL commands

Beeswax –Web console

Word count in hive

Day 11:

Introduction to PIG

Installation

Data flow Scripts

Handling structured and unstructured

Day 12:

Introduction to NOSQL

ACID /CAP/BASE

Key value pair -Map reduce

Column family-Hbase

Document -MongoDB

Graph DB -Neo4j

Day 13:

Introduction to HBASE and installation.

The HBase Data Model

The HBase Shell

HBase Architecture

Schema Design

The HBase API

HBase Configuration and Tuning

Day 14:

Introduction to Sqoop and installation.

Bulk loading

Hadoop Streaming.

Day 15:

Flume –NG

Source,Sink,Channel –Agent

Avro

Zoo keeper

chukwa and oozie

Day 16:

Integrate With ETL

Talend Data studio

Day 17 :

Big data Analytics-Visualization

Tableau or Jaspersoft

Cloudera /Hortonworks/Greenplum

Day 18:

Introduction to Data science

Data mining -Machine learning

Statistical Analysis –Predictive modelling

Sentiment Analysis or opinion mining

Day 19 :

Use cases ,Case studies and Proof of Concepts

Day 20 and Day 21(Optional)

CCD-410 - Cloudera Certification Questions Discussion.

Call 9884218531

Email :

info@geoinsyssoft.com / geoinsys@gmail.com

Phone :

+91 44 43542263 / 43542262

Mobile :

+91 9884218531

Address :

#2, 4th Floor, Balaji Nagar ,
1st Main Road, Ekkaduthangal,
Chennai - 600032,
Landmark : Opp Virtusa IT Park/Behind Petrol bunk

Other Links

For more details Course curriculum ,duration ,fees > Click here
Call 9884218531 for Demo

Best bigdata training In Chennai

Tuesday, February 19, 2013

Data science Course curriculum

Big data Hadoop

Introduction and Overview of Hadoop

Introduction and Overview of Hadoop

Hadoop Distributed FileSystem (HDFS)

HBase - The Hadoop Database

Map/Reduce 2.0

MapReduce Workflows

Pig

Hive

Sqoop

Big data analytics using R

Regressions

Decision models

Sentiment analysis

Text classification

Natural language processing

Probabilistic model

Graphical model

SaaS

Email :

Phone :

Mobile :

Address :

Big data Analytics with R training Chennai

Email :

Phone :

Mobile :

Address :

Big data Course curriculum of Geoinsyssoft

Big data Hadoop Training Chennai at Geoinsyssoft.

BIG DATA:

Roles

Email :

Phone :

Mobile :

Address :

About Me

Blog Archive

Tuesday, February 19, 2013

Data science Course curriculum

Big data Hadoop

Introduction and Overview of Hadoop

Introduction and Overview of Hadoop

Hadoop Distributed FileSystem (HDFS)

HBase - The Hadoop Database

Map/Reduce 2.0

MapReduce Workflows

Pig

Hive

Sqoop

Big data analytics using R

Regressions Decision models Sentiment analysis Text classification Natural language processing Probabilistic model Graphical model SaaS

Email :

Phone :

Mobile :

Address :

Big data Analytics with R training Chennai

Email :

Phone :

Mobile :

Address :

Big data Course curriculum of Geoinsyssoft

Big data Hadoop Training Chennai at Geoinsyssoft.

BIG DATA:

Roles

Email :

Phone :

Mobile :

Address :

Regressions

Decision models

Sentiment analysis

Text classification

Natural language processing

Probabilistic model

Graphical model

SaaS