Tuesday, February 19, 2013

Data science Course curriculum

                                         www.geoinsyssoft.com/courses
         For more details Course curriculum ,duration ,fees Click here 
                                                  Call 9884218531  for Demo 




Data science training in Chennai 

          Geoinsyssoft





We live in a data-driven world. Increasingly, the efficient operation of organizations across sectors relies on the effective use of vast amounts of data. Making sense of big data is a combination of organizations having the tools, skills and more importantly, the mindset to see data as the new "oil" fueling a company. Unfortunately, the technology has evolved faster than the workforce skills to make sense of it and organizations across sectors must adapt to this new reality or perish." -Data Scientist


* Data modeling: relations, key-value, trees, graphs, images, text
* Relational algebra and parallel query processing
* NoSQL systems, key-value stores
* Tradeoffs of SQL, NoSQL, and NewSQL systems
* Algorithm design in Hadoop (and MapReduce in general)
* Basic statistical analysis at scale: sampling, regression
* Introduction to data mining: clustering, association rules, decision trees
* Case studies in analytics: social networking, bioinformatics, text processing




Statistics
Big data Hadoop 
Big data analytics
Machine learning
data mining

Statistics Course curriculum :


Variables and Graphs 
 Variables 
 Graphs for Categorical Variables 
 Stemplots 
Numerical Summary of Data 
 Mean as center of the distribution 
 Median as center of the distribution 
 Spread using quartiles 
 Spread using the interquantity range 
 Boxplots 
 Standard deviation as spread 

Normal Distribution 
 Normal distributions 
 Standardized ( Z- ) observations 
 Standard normal curve  Calculations with the normal 
 Normal quantile or probability plots 

Scatter plots 
 Interpreting scatter plots 
Least Square Regression 
 fitting a line to data 
 prediction and residuals 
 Least square regression 

Correlation 
 correlation r 
 r squared - coefficient of determination 
 examples 
Categorical Data 
 describing relationships 
 conditional distributions 
 Simpsons paradox 
Causation 

 Producing Data 
Design of Experiments 
 Randomization 
 How to Randomize 
 Simple Random Sampling 
Statistical Inference Foundations 
 Sampling Variability 
 Sampling Distribution 
 Variability of a statistic 

 Probability 
Randomness 
 Language of Probability 
 Uses of Probability 
 Probability values 

 Inference 
 Sampling distribution for Proportions and Counts 
 Sample proportions 
 Normal Approximation 
 Sampling distribution for the Mean 
 Mean and Standard Deviation 
 Sampling Distribution  Central Limit Theorm 

Confidence and Significance 
 Confidence Intervals 
 Confidence Intervals in General 
 Confidence Internal for a population mean 
 How confidence Internals Behave 
 Tests of Significance 
 Reasoning of the Test 
 Hypothesis 
 Test Statistic 
 P-values 
 Tests for a Population Mean 
 Fixed Level Tests 
 Statistical Significance vs. Practical Significance 

 Inference for Distributions 
 Inference for the Mean of a Population 
 t distribution 
 one sample t test 
 one sample confidence interval 
 Matched Pairs t test  Comparing two Means 
 two sample -Z test 
 two sample t test 
 two sample confidence internal 
 Comparing Spread of Distributions 
 Inference for Spread the Chi-Square Distribution 
 F Test for Equality of Spread in Two Populations 

Simple Linear Regression 
 Model for Regression 
 Estimating Parameters 
 Confidence Internals 
 Significance tests 
 ANOVA for Regression 
 The ANOVA table 
 The F test 
 Inference for the Slope 

Phase 2: 

Big data   Hadoop 

    Introduction and Overview of Hadoop
    • Introduction to Bigdata
    • Architecture of BD
    • 3V Concept
    • FRamework and Applications/Tools
    • Samples
    Introduction and Overview of Hadoop
    • What is Hadoop?
    • History of Hadoop
    • Building Blocks - Hadoop Eco-System
    • Who is behind Hadoop?
    • What Hadoop is good for and what it is not
    Hadoop Distributed FileSystem (HDFS)
    • HDFS Overview and Architecture
    • HDFS Installation
    • HDFS Use Cases
    • Hadoop FileSystem Shell
    • FileSystem Java API
    • Hadoop Configuration
    HBase - The Hadoop Database
    • HBase Overview and Architecture
    • HBase Installation
    • HBase Shell
    • Java Client API
    • Java Administrative API
    • Filters
    • Scan Caching and Batching
    • Key Design
    • Table Design
    Map/Reduce 2.0
    • MapReduce 2.0
    • MapReduce 2.0 Architecture
    • Installation
    • YARN and MapReduce Command Line Tools
    • Developing MapReduce Jobs
    • Input and Output Formats
    • HDFS and HBase as Source and Sink
    • Job Configuration
    • Job Submission and Monitoring
    • Anatomy of Mappers, Reducers, Combiners and Partitioners
    • Anatomy of Job Execution on YARN
    • Distributed Cache
    • Hadoop Streaming
    MapReduce Workflows
    • Decomposing Problems into MapReduce Workflow
    • Using JobControl
    • Oozie Introduction and Architecture
    • Oozie Installation
    • Developing, deploying, and Executing Oozie Workflows
    Pig
    • Pig Overview
    • Installation
    • Pig Latin
    • Developing Pig Scripts
    • Processing Big Data with Pig
    • Joining data-sets with Pig
    Hive
    • Hive Overview
    • Installation
    • Hive QL
    Sqoop
    • Sqoop Overview
    • Installation
    • HDFS-Sqoop-HIVE

Big data analytics using R 

  • Overview
    • History of R
    • Advantages and disadvantages
    • Downloading and installing
    • How to find documentation
  • Introduction
    • Using the R console
    • Getting help
    • Learning about the environment
    • Writing and executing scripts
    • Saving your work
  • Data Structures, Variables
    • Variables and assignment
    • Data types
    • Indexing, subsetting
    • Viewing data and summaries
    • Functions
    • Naming conventions
    • Objects
    • Models
    • Graphics
  • Control Flow
    • Truth testing
    • Branching
    • Looping
    • Vectorized calculations
  • Functions
    • Parameters
    • Return values
    • Variable scope
    • Exception Handling
  • Getting Data into the R environment
    • Builtin data
    • Reading local data
    • Web data
  • Overview of Statistics in R
    • Introduction to R Graphics
    • Model notation
  • Descriptive statistics
    • Continuous data
      • Scatter plot
      • Box plot
    • Categorical data
      • Mosaic plot
    • Correlation
  • Inferential statistics
    • T-test and non-parametric equivalents
    • Chi-squared test, logistic regression
    • Distribution testing
    • Power testing
  • Linear Regression
    • Linear models
    • Regression plots
    • ANOVA
  • Other Topics
    • Classification
    • Clustering
    • Time series
    • Dimensionality reduction
    • Machine Learning
  • Object Oriented R
    • Generic functions
    • S3/S4 classes
  • Installing Packages
    • Finding resources
    • Installing resources
  • More about Graphics
    • Labels
    • Exporting graphics
  • Sophisticated Graphics in R
    • Lattice
    • GGplot2
    • Interactive graphics
    • Animated GIF
    • rGGobi
  • R for Mapping and GIS
    • Choropleth maps
    • Layers
www.geoinsyssoft.com
Machine Learning  using Octave/weka


  • Regressions
  • Decision models
  • Sentiment analysis
  • Text classification
  • Natural language processing
  • Probabilistic model
  • Graphical model
  • SaaS






Data science and Big data training Chennai



Email :

info@geoinsyssoft.com / geoinsys@gmail.com

Phone :

+91 44 43542263 / 43542262

Mobile :

+91 9884218531

Address :

#2, 4th Floor, Balaji Nagar ,
1st Main Road, Ekkaduthangal,
Chennai - 600032,
Landmark : Opp Virtusa IT Park/Behind Petrol bunk




                                                        www.geoinsyssoft.com/courses
         For more details Course curriculum ,duration ,fees Click here 
                                                  Call 9884218531  for Demo 








Big data Analytics with R training Chennai

                                        www.geoinsyssoft.com/courses
         For more details Course curriculum ,duration ,fees Click here 
                                                  Call 9884218531  for Demo

Big data Analytics training Course content:









  • To identify which statistical methodologies are adequate in order to tackle a given problem;
  • To explore, represent and summarize distinct data structures;
  • To infer population behaviors given available sample datasets;
  • To estimate and test the robustness of causal relationships.


  • Overview
    • History of R
    • Advantages and disadvantages
    • Downloading and installing
    • How to find documentation
  • Introduction
    • Using the R console
    • Getting help
    • Learning about the environment
    • Writing and executing scripts
    • Saving your work
  • Data Structures, Variables
    • Variables and assignment
    • Data types
    • Indexing, subsetting
    • Viewing data and summaries
    • Functions
    • Naming conventions
    • Objects
    • Models
    • Graphics
  • Control Flow
    • Truth testing
    • Branching
    • Looping
    • Vectorized calculations
  • Functions
    • Parameters
    • Return values
    • Variable scope
    • Exception Handling
  • Getting Data into the R environment
    • Builtin data
    • Reading local data
    • Web data
  • Overview of Statistics in R
    • Introduction to R Graphics
    • Model notation
  • Descriptive statistics
    • Continuous data
      • Scatter plot
      • Box plot
    • Categorical data
      • Mosaic plot
    • Correlation
  • Inferential statistics
    • T-test and non-parametric equivalents
    • Chi-squared test, logistic regression
    • Distribution testing
    • Power testing
  • Linear Regression
    • Linear models
    • Regression plots
    • ANOVA
  • Other Topics
    • Classification
    • Clustering
    • Time series
    • Dimensionality reduction
    • Machine Learning
  • Object Oriented R
    • Generic functions
    • S3/S4 classes
  • Installing Packages
    • Finding resources
    • Installing resources
  • More about Graphics
    • Labels
    • Exporting graphics
  • Sophisticated Graphics in R
    • Lattice
    • GGplot2
    • Interactive graphics
    • Animated GIF
    • rGGobi
  • R for Mapping and GIS
    • Choropleth maps
    • Layers




Email :

info@geoinsyssoft.com / geoinsys@gmail.com

Phone :

+91 44 43542263 / 43542262

Mobile :

+91 9884218531

Address :

#2, 4th Floor, Balaji Nagar ,
1st Main Road, Ekkaduthangal,
Chennai - 600032,
Landmark : Opp Virtusa IT Park/Behind Petrol bunk


                                     www.geoinsyssoft.com/courses
         For more details Course curriculum ,duration ,fees Click here 
                                                  Call 9884218531  for Demo 




Big data Course curriculum of Geoinsyssoft

                                        www.geoinsyssoft.com/courses
         For more details Course curriculum ,duration ,fees Click here 
                                                  Call 9884218531  for Demo 

Big data Hadoop Training Chennai  at Geoinsyssoft.



BIG DATA:

Roles
  • Data Scientist
  • Hadoop Developer
  • Architect
  • Administrator
  • Consultant
  • Programmer
  • Business Analyst
  • Day 1:
    Introduction to Big Data.
    Realtime usages
    Volume ,Variety,Velocity,Value
    Compare with existing OLTP,ETL,DWH,OLAP
    Day 2
    Introduction to Hadoop 1.0 and Hadoop 2.0
    Architecture
    HDFS Cluster – Data Storage Framework
    Map Reduce  - Data Processing Framework
    HBASE – NOSQL Database
    HIVE Warehouse
    PIG  latin Data flow scripts
    SQOOP –Bulk data transfer for relational database
    Flume  -Streaming Logs

    DAY 3
    Setup -VM Linux /ubuntu/CentOS
    Java
    Hadoop setup and configuration –version 1.1.2 and 2.05
    Hadoop 1.0 cluster and Daemons
    Name node – Metadata , fsimage ,Editlog , Block reports
    Rack awareness policy
    Safe mode ,rebalancing and load optimization
    Data node – Writing, reading and replication of blocks
    Job tracker – Intialization, Execution, IO, failure
    Task tracker – Initialization , progress, failure
    Secondary Namenode – Not a backup
    DAY 4
    Installation and config of Hadoop 2.0 –YARN
    Resource Manager – resource and job Management
    Application Manager
    Scheduler  - Fair ,Capacity ,Priority
    Node Manager
    Application Master
    Container – Yarn Child and task execution
    UBER job
    Failure of Application ,RM,AM,NM

    Day 5:
    Unix and Java Basics.
    HDFS file operations  fs shell

     
    Day 6:
    Introduction to Mapreduce.
    Architecture of MR v1 and v2
    Key Value Pairs
    Mapper – setup/Config,init,map,cleanup,close
    Shuffle and Sort
    Combiner
    Pratitioner
    Reducer

    Day 7:
    Map reduce  word count program.

    Structured and Unstructured Data handling
    Data processing 
    Map only jobs 

    Day 8 and Day 9
    MR Programs 2:
    Combiner and Partitioner
    Single and multiple column
    Inverted index
    XML -semi structured data
    Map side joins.
    Reduce side join.

    Day 10
    Introduction to HIVE Datawarehouse
    Architecture Installation
    Basic HQL Commands
    Load, external table
    Join
    Partioning
    Bucket
    Advance HQL commands
    Beeswax –Web console
    Word count in hive

    Day 11:
    Introduction to PIG
    Installation
    Data flow Scripts
    Handling structured and unstructured

    Day 12:
    Introduction to NOSQL
    ACID /CAP/BASE
    Key value pair -Map reduce
    Column family-Hbase
    Document -MongoDB
    Graph DB -Neo4j

    Day 13:
    Introduction to HBASE and installation. 
    The HBase Data Model
    The HBase Shell
    HBase Architecture
    Schema Design
    The HBase API
    HBase Configuration and Tuning

    Day 14:
    Introduction to Sqoop and installation.
    Bulk loading
    Hadoop Streaming.

    Day 15:
    Flume –NG
    Source,Sink,Channel –Agent
    Avro  
    Zoo keeper
    chukwa and oozie

    Day 16:
    Integrate With ETL
    Talend Data studio

    Day 17 :
    Big data Analytics-Visualization
    Tableau or Jaspersoft
    Cloudera /Hortonworks/Greenplum

    Day 18:
    Introduction to Data science
    Data mining -Machine learning
    Statistical Analysis –Predictive modelling
    Sentiment Analysis or opinion mining

    Day 19 :
    Use cases ,Case studies and Proof of Concepts 

    Day 20 and Day 21(Optional)

    CCD-410 - Cloudera Certification Questions Discussion.

Call 9884218531


Email :

info@geoinsyssoft.com / geoinsys@gmail.com

Phone :

+91 44 43542263 / 43542262

Mobile :

+91 9884218531

Address :

#2, 4th Floor, Balaji Nagar ,
1st Main Road, Ekkaduthangal,
Chennai - 600032,
Landmark : Opp Virtusa IT Park/Behind Petrol bunk