Big Data Analytics: What is Hadoop?

Hadoop system

Apache Hadoop is an open source programming undertaking that empowers the disseminated transforming of vast information sets crosswise over groups of item servers. It is intended to scale up from a solitary server to many machines, with an extremely high level of flaw tolerance. Instead of depending on high-end equipment, the strength of these groups originates from the programming's capability to locate and handle disappointments at the requisition layer.

Apache Hadoop has two main subprojects:
MapReduce - The framework that understands and assigns work to the nodes in a cluster.
HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes

Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive andZookeeper, that extend the value of Hadoop and improves its usability.

So what’s the big deal?

Hadoop changes the commercial concerns and the motion of expansive scale registering. Its effect might be bubbled down to four striking qualities.

Hadoop enables a computing solution that is:

Fault tolerant – When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.
Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.
Cost effective – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.

Think Hadoop is right for you?

Eighty percent of the planet's information is unstructured, and most organizations don't even endeavor to utilize this information further bolstering their good fortune. Suppose you could stand to keep all the information created by your business? Suppose you had an approach to investigate that.

IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. With built-in analytics, extensive integration capabilities and the reliability, security and support that you require, IBM can help put your big data to work for you.

InfoSphere BigInsights Quick Start Edition, the latest edition to the InfoSphere BigInsights family, is a free, downloadable, non-production version.

With InfoSphere BigInsights Quick Start, you get access to hands-on learning through a set of tutorials designed to guide you through your Hadoop experience. Plus, there is no data capacity or time limitation, so you can experiment with large data sets and explore different use cases, on your own timeframe.

Big Data Analytics

Friday, October 25, 2013

What is Hadoop?

No comments:

Post a Comment