Skip to main content

HADOOP A new way to store and analyze data


A new way to store and analyze data  


Hadoop, Why?
ØNeed to process 100TB data sets.
ØNeed Efficient, Reliable and Usable framework.

     What Is Hadoop ??
ØHadoop was created by Douglas Reed Cutting, who named haddop after his child’s stuffed elephant to support Lucene and Nutch search engine projects.
ØHadoop is a software framework for distributed processing of large datasets across large clusters of computers
ØCore Hadoop has two main systems:
    – Hadoop Distributed File System: self-healing
        high-bandwidth clustered storage.
    – MapReduce: distributed fault-tolerant resource
        management and scheduling coupled with a
        scalable data programming abstraction.
Hdoop Architecture:-
The core Hadoop has two main systems:-
Hadoop Distributed File System(HDFS)
§A distributed file system that provides high throughput access to application data.
Map Reduce
§A software framework for distributed processing of large data sets on compute clusters.

HDFS (Hadoop Distributed FileSystem)
ØHadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications.
ØHDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations.
     HDFS Architecture Diagram 


     Hadoop Map Reduce
Ø     In Map Reduce, records are processed in isolation by tasks    called Mappers.  
Ø     The output from the Mappers is then brought together into          a second set of  tasks called  Reducers .
Map Reduce Implementation
1.Input files split (M splits)
2.Assign Master & Workers
3.Map tasks
4.Writing intermediate data to disk (regions)
5.Intermediate data read & sort
6.Reduce tasks
7.Return


 Benefits of MapReduce
ØCapable of processing vast amounts of data
    Scales linearly
Ø Same data problem will process 10x faster on 10x larger cluster
ØIndividual failures have minimal impact
ØFailures during a job cause only a small portion of the job to re-executed

      Drawbacks of MapReduce
ØJob setup takes time (e.g., several seconds)
   
Ø Map Reduce is not for real-time interaction
Ø
ØRequires deep understanding of the MapReduce paradigm
Ø
ØNot all problems are easily expressed in MapReduce.
     Advantages:-
ØHadoop is designed to run on cheap commodity hardware
ØIt automatically handles data replication and node failure
ØIt does the hard work – you can focus on processing data
ØCost Saving and efficient and reliable data processing

     Conclusion:-
ØHadoop is data storage and analysis platform for large volumes of data.
ØHadoop will sit along side, not replace your existing RDBMS.
ØHadoop has many tools to ease data analysis.


Comments

Popular posts from this blog

5 thing you must know about Sunil Chhetri

                                      Sunil Chhetri  was born on 3 August 1984 in Secunderabad (Telangana). He is an Indian professional footballer who plays as a striker for Indian club Bengaluru FC and the All India Football Federation (AIFF). He is also the current captain of the All India Football Federation (AIFF) . his father is an officer in the Electronics and Mechanical Engineers Corps of the Indian Army and also played for Indian Army team.He played for the national team for the first time on 2005, against Pakistan where he scored his first goal. Image courtesy   Wikipedia . 5 thing you must know about Sunil Chhetri              1. He is all time top goal scorer for the All India Football Federation (AIFF) with 61 goal. 2. He is the second All India Football Federation (AIFF) player how achieve millstone to play 100 Matche...

What is 5G Network and How it Works

5G 5G refers to the fifth generation of cellular wireless standards.5G is a packet switched wireless system with wide area coverage and high throughput5G wireless uses OFDM and millimeter wireless that enables data rate and Diff. frequency band of 2-8 GHz.5G is going to be a packet based network. EVOLUTION OF 1G-5G  Generation refers change in nature of Service compatible transmission technology and new frequency bands  1G systems used Analog frequency modulation. and 2G systems use Digital communication techniques with TDM, FDM, CDMA  3G systems offer higher data rates and voice and paging services to provide interactive multimedia including teleconferencing and internet access  4G aims to provide IP telephony, ultra-broadband Internet access gaming services 5G TECHNOLOGY OFFERS  World Wide Cellular Phones  Extra Ordinary Data Capabilities  High Connectivity  Bright Future Comparison on 1G-5G NETWORK LAYER IPV...