Hadoop! Now a days very popular in handling Big data and its analytic implementation.
But before starting Hadoop we must know that why we need it.
- Facebook generates peta byte of data per day.
- An aircraft generates 10 tera bytes of data per second.
- Electricity transmission also generates big data.
- Trading data or stock exchange also produce large amount of data.
But the drawback is, previously only particular set of data was analyzed which were available in the form of rows and columns.
We used Hydrogen Collider instrument which generated petabyte of data for every quantum of power.
As all data comes for example from facebook in log files which don’t have rows and columns that why we called unstructured data and here we need of some other tool like Hadoop.
If we talk data form then it would be divided in part:
- Volume
- Velocity
- Variety
- Value
Now for example if we want to read 1 TB of data by 1 machine which has 4 I/o channels and each channel speed is 100 MB/Second then it will take 45 min to read all data.
While 10 machine takes 4.5 min to read data.
Definition of Hadoop:
Apache Hadoop is a framework that allows for the distributed processing of large data set of across cluster (group) of commodity (normal PC) computers using a simple programming model.Currently Adhaar Card implementation is bruning example of Hadoop uses in India.
No comments:
Post a Comment