Java @ Technology: Hadoop - Manage Big Data and brings power to the Enterprise Solutions

What Is Hadoop?

Apache Hadoop is a free and open source implementation of frameworks for reliable, scalable, distributed computing and data storage. It enables applications to work with thousands of nodes and petabytes of data as such is a great tool for research and business operations. Hadoop was inspired by Google's MapReduce Google File System (GFS) papers.

Its being increasingly common to have data sets that are to large to be handled by traditional database or by any technique that runs on a single computer or even a small cluster of computers. In the age of Big-Data, Hadoop has evolved as the library of choice for handling it.

Why Hadoop is needed?

Companies continue to generate large amounts of data. Here are some statistics of 2011.
-- Facebook : 6 billion messages per day.
-- EBay : 9 petabytes of storage per day.
Existing tools were not designed to handle such large amount of Big Data.

How Hadoop Stores Data?

In Hadoop data can be stored very easily and processed information occupies very little space, any legacy system or big system can store data for a long time very easily and with minimum costs. The way Hadoop is built is very interesting. each part was created for something big starting with file storage to processing and scalability. One of the most important and interesting components that Hadoop has is the file storage system - Hadoop Distributed File System (HDFS). Generally when we talk about general storage system with high capacity we think of custom hardware which is extremely costly (price and maintenance). HDFS is a system which doesn't require special hardware. It runs smoothly on normal configuration and can be used together with our home and office computers.