For more details Course curriculum ,duration ,fees > Click here
Call 9884218531 for Demo
Data science training in Chennai
Geoinsyssoft
We live in a data-driven world. Increasingly, the efficient operation of organizations across sectors relies on the effective use of vast amounts of data. Making sense of big data is a combination of organizations having the tools, skills and more importantly, the mindset to see data as the new "oil" fueling a company. Unfortunately, the technology has evolved faster than the workforce skills to make sense of it and organizations across sectors must adapt to this new reality or perish." -Data Scientist
* Data modeling: relations, key-value, trees, graphs, images, text
* Relational algebra and parallel query processing
* NoSQL systems, key-value stores
* Tradeoffs of SQL, NoSQL, and NewSQL systems
* Algorithm design in Hadoop (and MapReduce in general)
* Basic statistical analysis at scale: sampling, regression
* Introduction to data mining: clustering, association rules, decision trees
* Case studies in analytics: social networking, bioinformatics, text processing
Statistics
Big data Hadoop
Big data analytics
Machine learning
data mining
Statistics Course curriculum :
Variables and Graphs
Variables
Graphs for Categorical Variables
Stemplots
Numerical Summary of Data
Mean as center of the distribution
Median as center of the distribution
Spread using quartiles
Spread using the interquantity range
Boxplots
Standard deviation as spread
Normal Distribution
Normal distributions
Standardized ( Z- ) observations
Standard normal curve Calculations with the normal
Normal quantile or probability plots
Scatter plots
Interpreting scatter plots
Least Square Regression
fitting a line to data
prediction and residuals
Least square regression
Correlation
correlation r
r squared - coefficient of determination
examples
Categorical Data
describing relationships
conditional distributions
Simpsons paradox
Causation
Producing Data
Design of Experiments
Randomization
How to Randomize
Simple Random Sampling
Statistical Inference Foundations
Sampling Variability
Sampling Distribution
Variability of a statistic
Probability
Randomness
Language of Probability
Uses of Probability
Probability values
Inference
Sampling distribution for Proportions and Counts
Sample proportions
Normal Approximation
Sampling distribution for the Mean
Mean and Standard Deviation
Sampling Distribution Central Limit Theorm
Confidence and Significance
Confidence Intervals
Confidence Intervals in General
Confidence Internal for a population mean
How confidence Internals Behave
Tests of Significance
Reasoning of the Test
Hypothesis
Test Statistic
P-values
Tests for a Population Mean
Fixed Level Tests
Statistical Significance vs. Practical Significance
Inference for Distributions
Inference for the Mean of a Population
t distribution
one sample t test
one sample confidence interval
Matched Pairs t test Comparing two Means
two sample -Z test
two sample t test
two sample confidence internal
Comparing Spread of Distributions
Inference for Spread the Chi-Square Distribution
F Test for Equality of Spread in Two Populations
Simple Linear Regression
Model for Regression
Estimating Parameters
Confidence Internals
Significance tests
ANOVA for Regression
The ANOVA table
The F test
Inference for the Slope
Phase 2:
Big data Hadoop
- Introduction to Bigdata
- Architecture of BD
- 3V Concept
- FRamework and Applications/Tools
- Samples
- What is Hadoop?
- History of Hadoop
- Building Blocks - Hadoop Eco-System
- Who is behind Hadoop?
- What Hadoop is good for and what it is not
- HDFS Overview and Architecture
- HDFS Installation
- HDFS Use Cases
- Hadoop FileSystem Shell
- FileSystem Java API
- Hadoop Configuration
- HBase Overview and Architecture
- HBase Installation
- HBase Shell
- Java Client API
- Java Administrative API
- Filters
- Scan Caching and Batching
- Key Design
- Table Design
- MapReduce 2.0
- MapReduce 2.0 Architecture
- Installation
- YARN and MapReduce Command Line Tools
- Developing MapReduce Jobs
- Input and Output Formats
- HDFS and HBase as Source and Sink
- Job Configuration
- Job Submission and Monitoring
- Anatomy of Mappers, Reducers, Combiners and Partitioners
- Anatomy of Job Execution on YARN
- Distributed Cache
- Hadoop Streaming
- Decomposing Problems into MapReduce Workflow
- Using JobControl
- Oozie Introduction and Architecture
- Oozie Installation
- Developing, deploying, and Executing Oozie Workflows
- Pig Overview
- Installation
- Pig Latin
- Developing Pig Scripts
- Processing Big Data with Pig
- Joining data-sets with Pig
- Hive Overview
- Installation
- Hive QL
- Sqoop Overview
- Installation
- HDFS-Sqoop-HIVE
Introduction and Overview of Hadoop
Introduction and Overview of Hadoop
Hadoop Distributed FileSystem (HDFS)
HBase - The Hadoop Database
Map/Reduce 2.0
MapReduce Workflows
Pig
Hive
Sqoop
Big data analytics using R
- Overview
- History of R
- Advantages and disadvantages
- Downloading and installing
- How to find documentation
- Introduction
- Using the R console
- Getting help
- Learning about the environment
- Writing and executing scripts
- Saving your work
- Data Structures, Variables
- Variables and assignment
- Data types
- Indexing, subsetting
- Viewing data and summaries
- Functions
- Naming conventions
- Objects
- Models
- Graphics
- Control Flow
- Truth testing
- Branching
- Looping
- Vectorized calculations
- Functions
- Parameters
- Return values
- Variable scope
- Exception Handling
- Getting Data into the R environment
- Builtin data
- Reading local data
- Web data
- Overview of Statistics in R
- Introduction to R Graphics
- Model notation
- Descriptive statistics
- Continuous data
- Scatter plot
- Box plot
- Categorical data
- Mosaic plot
- Correlation
- Inferential statistics
- T-test and non-parametric equivalents
- Chi-squared test, logistic regression
- Distribution testing
- Power testing
- Linear Regression
- Linear models
- Regression plots
- ANOVA
- Other Topics
- Classification
- Clustering
- Time series
- Dimensionality reduction
- Machine Learning
- Object Oriented R
- Generic functions
- S3/S4 classes
- Installing Packages
- Finding resources
- Installing resources
- More about Graphics
- Labels
- Exporting graphics
- Sophisticated Graphics in R
- Lattice
- GGplot2
- Interactive graphics
- Animated GIF
- rGGobi
- R for Mapping and GIS
- Choropleth maps
- Layers
www.geoinsyssoft.com
Machine Learning using Octave/weka
- Regressions
- Decision models
- Sentiment analysis
- Text classification
- Natural language processing
- Probabilistic model
- Graphical model
- SaaS
Data science and Big data training Chennai
Email :
info@geoinsyssoft.com / geoinsys@gmail.comPhone :
+91 44 43542263 / 43542262
Mobile :
+91 9884218531
Address :
#2, 4th Floor, Balaji Nagar ,
1st Main Road, Ekkaduthangal,
Chennai - 600032,
Landmark : Opp Virtusa IT Park/Behind Petrol bunk