Tuesday, September 15, 2020

Hadoop;

 


    What do  you mean  by  Hadoop ?

     Hadoop is a framework which is used to develop applications that could perform 

     complete statistical analysis in huge amount  of  data. It is a open source framework

     that is used in large datasets ranging from gigabytes to petabytes. It can handle various

     forms of structures and unstructured  data that  giving users more flexibility for

     collecting. processing, analyzing and maintaining data .

     It involves grouping multiple  computers in clusters to analyze datasets parellely

     and more quickly.

 

   A Brief History on Hadoop :  

   Hadoop was initially developed in 2005 by Apache  Software Foundation , 

   a non profit organization which develops open source software.

   Hadoop was name of the yellow toy elephant  owned  by son of its one of its

   Investors.In 2008 Yahoo released Hadoop as its open source project. 


 Importance of Hadoop:

  1. Store and Process huge amount of data quickly

  2. It is free and uses commodity hardware to store large amount of data

  3. Data and application processing are protected against hardware failure

  4. It stores and process structured and  unstructured data quickly and easily 


Hadoop Architecture

  Hadoop has four  core  modules which is included in the basic framework  

  by Apache Foundation:

    1. Hadoop Common: 

        It provides common java libraries that can be used across  all modules

   2. Hadoop Distributed File System:

       It is used for storing  data across  multiple  machines and

       provides better throughput than traditional file system, high fault tolerance

       and support  for  large  datasets.

  3. Yet Another Resource Negotiator:

       It helps  to manage resources  for the processes  running on the Hadoop

       It schedules  jobs and task.  

4.Map Reduce:

    It is a framework that helps programs to do parallel computation of data.

    It has basically two steps . In the first step  Mater node takes inputs and 

    partitions  them into smaller subprograms and then distributes  it to 

    sub nodes . After that the Master  nodes  takes the answers to all the

    sub problems and combines  them  to  produce  output. 


Environment Set Up for Hadoop :

  Hadoop is basically  supported by GNU/Linux environment.Therefore we have

  to  install Linux operating System  for setting up Hadoop Environment.

 If we different Operating System we can install virtual box and then install

 Linux in the virtual box.

 

  In Today's age where data has been increased at an mind boggling rate, Hadoop 

  has been successfully used by people  to  build  large  and  complex  applications.

  It has  also  met  the scalability requirements for large and varied amount of data .


 

 

 

 


No comments:

Post a Comment