What is Big Data?
We are living in an era where a huge volume of data has been generated. The below points indicates the rate at which data has been generated in the world.
- Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.
- According to IBM, 80% of data captured today is unstructured, from sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals, to name a few. All of this unstructured data is Big Data.
If your company is dealing with a set of data that makes you difficult to process and make information out of it, then you are facing and handling big-data.
There are traditional data storage systems, but these are soon become obsolete. Nowadays Hadoop is an integral part of lots of companies who were dealing with big-data to store and process a huge volume of structured and unstructured data.
Role of Hadoop
What is Hadoop?
Hadoop is an open source project from the Apache software foundation that provides a software framework for distributed application in order to store and process data on clusters of servers. This is modeled after Google's MapReduce Programming and Google file system.
Yahoo launched Yahoo! Search Web-map in 2008, which is the first and largest implementation of Hadoop. The Hadoop cluster of Yahoo! Search Web-Map consist of 10,000 more Linux servers.
What does Hadoop solve?
- Organizations are discovering that important predictions can be made by sorting through and analyzing Big Data.
- However, since 80% of this data is "unstructured", it must be formatted (or structured) in a way that makes it suitable for data mining and subsequent analysis.
- Hadoop is the core platform for structuring Big Data and solves the problem of making it useful for analytics purposes.
- Hadoop is very suits to process unstructured data too.
The feature of Hadoop's Computing Solution
- Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.
- Cost effective– Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
- Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.
- Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat
Comments