Feb 2, 2010

What is Hadoop?

What is Hadoop? (plain-English)
Hadoop is a Java framework which brings compute to the storage.

What is Hadoop? (technical-definition)
Hadoop = HDFS + MapReduce.

What is HDFS?
HDFS=Hadoop Distributed File System.

You can think of this way-
it takes a single file, cuts into multiple chunks and distributes it across many computers. Its an layer on top of disk file system.

What is MapReduce?
Distributed processing framework, works on top of HDFS.

You can this of this way-
after HDFS distributes data to different computers, MapReduce sends programs to those computers to work on those data.

==================================
Here is the reason for this blog:

I recently attended "an introduction to Hadoop" talk and this event attracted closer to 200 people or so. Including the organizer of the event, everyone was surprised to see these many people coming to this event. I thought either Hadoop user-base is expanding OR still people are trying to understand what is Hadoop & how can i use it? It looks to me that still Hadoop is in early stage & the community is trying to learn it.

The presenter did a great job explaining the basics of Hadoop. Thank you. I cut the verbose out from the presentation and posting the core of the presentation below. I hope now you should be able to understand the basics of Hadoop.

===================================













0 Comments:

Post a Comment

<< Home