Best bigdata training In Chennai: Bigdata-Hadoop-Simple_Learning -Geoinsyssoft-chennai-Training-and-Consulting

Thursday, June 20, 2013

Bigdata-Hadoop-Simple_Learning -Geoinsyssoft-chennai-Training-and-Consulting

Big data -Simple learning

“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

Hadoop -

Map Reduce- Parallel processing of large data

§ Hive - Hadoop data warehouse

§ Hbase - NoSQL key value pair database

· Written in: Java

· Main point: Billions of rows X millions of columns

· License: Apache

· Protocol: HTTP/REST (also Thrift)

· Modeled after Google's BigTable

· Uses Hadoop's HDFS as storage

· Map/reduce with Hadoop

· Query predicate push down via server side scan and get filters

· Optimizations for real time queries

· A high performance Thrift gateway

· HTTP supports XML, Protobuf, and binary

· Jruby-based (JIRB) shell

· Rolling restart for configuration changes and minor upgrades

· Random access performance is like MySQL

· A cluster consists of several different types of nodes

Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.

For example: Search engines. Analysing log data. Any place where scanning huge, two-dimensional join-less tables are a requirement.

§ Mahout - Machine Learning

§ Pig - Scripting language

§ Hue - Graphical user interface

§ Whirr- libraries for running cloud services

§ Oozie - Workflow engine

§ Zookeeper - Workflow manager

§ Avro - Serialization

§ Flume - Streaming

§ Sqoop - RDBMS connectivity

§ Chukwa - Data Collection

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)