What do you mean by Hadoop ?
Hadoop is a framework which is used to develop applications that could perform
complete statistical analysis in huge amount of data. It is a open source framework
that is used in large datasets ranging from gigabytes to petabytes. It can handle various
forms of structures and unstructured data that giving users more flexibility for
collecting. processing, analyzing and maintaining data .
It involves grouping multiple computers in clusters to analyze datasets parellely
and more quickly.
A Brief History on Hadoop :
Hadoop was initially developed in 2005 by Apache Software Foundation ,
a non profit organization which develops open source software.
Hadoop was name of the yellow toy elephant owned by son of its one of its
Investors.In 2008 Yahoo released Hadoop as its open source project.
Importance of Hadoop:
1. Store and Process huge amount of data quickly
2. It is free and uses commodity hardware to store large amount of data
3. Data and application processing are protected against hardware failure
4. It stores and process structured and unstructured data quickly and easily
Hadoop Architecture
Hadoop has four core modules which is included in the basic framework
by Apache Foundation:
1. Hadoop Common:
It provides common java libraries that can be used across all modules
2. Hadoop Distributed File System:
It is used for storing data across multiple machines and
provides better throughput than traditional file system, high fault tolerance
and support for large datasets.
3. Yet Another Resource Negotiator:
It helps to manage resources for the processes running on the Hadoop
It schedules jobs and task.
4.Map Reduce:
It is a framework that helps programs to do parallel computation of data.
It has basically two steps . In the first step Mater node takes inputs and
partitions them into smaller subprograms and then distributes it to
sub nodes . After that the Master nodes takes the answers to all the
sub problems and combines them to produce output.
Environment Set Up for Hadoop :
Hadoop is basically supported by GNU/Linux environment.Therefore we have
to install Linux operating System for setting up Hadoop Environment.
If we different Operating System we can install virtual box and then install
Linux in the virtual box.
In Today's age where data has been increased at an mind boggling rate, Hadoop
has been successfully used by people to build large and complex applications.
It has also met the scalability requirements for large and varied amount of data .
No comments:
Post a Comment