Big Data Analytics: October 2013

Friday, October 25, 2013

What is Hadoop?

Hadoop system

Apache Hadoop is an open source programming undertaking that empowers the disseminated transforming of vast information sets crosswise over groups of item servers. It is intended to scale up from a solitary server to many machines, with an extremely high level of flaw tolerance. Instead of depending on high-end equipment, the strength of these groups originates from the programming's capability to locate and handle disappointments at the requisition layer.

Apache Hadoop has two main subprojects:
MapReduce - The framework that understands and assigns work to the nodes in a cluster.
HDFS - A file system that spans all the nodes in a Hadoop cluster for data storage. It links together the file systems on many local nodes to make them into one big file system. HDFS assumes nodes will fail, so it achieves reliability by replicating data across multiple nodes

Hadoop is supplemented by an ecosystem of Apache projects, such as Pig, Hive andZookeeper, that extend the value of Hadoop and improves its usability.

So what’s the big deal?

Hadoop changes the commercial concerns and the motion of expansive scale registering. Its effect might be bubbled down to four striking qualities.

Hadoop enables a computing solution that is:

Fault tolerant – When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.
Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.
Cost effective – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.

Think Hadoop is right for you?

Eighty percent of the planet's information is unstructured, and most organizations don't even endeavor to utilize this information further bolstering their good fortune. Suppose you could stand to keep all the information created by your business? Suppose you had an approach to investigate that.

IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. With built-in analytics, extensive integration capabilities and the reliability, security and support that you require, IBM can help put your big data to work for you.

InfoSphere BigInsights Quick Start Edition, the latest edition to the InfoSphere BigInsights family, is a free, downloadable, non-production version.

With InfoSphere BigInsights Quick Start, you get access to hands-on learning through a set of tutorials designed to guide you through your Hadoop experience. Plus, there is no data capacity or time limitation, so you can experiment with large data sets and explore different use cases, on your own timeframe.

Datawarehouse vs Hadoop

I found a excelent article about Hadoop vs Enterprise Datawarehouse (EDW). In this article you find a very good compare.

Is hadoop solution for your problems? do you recommend hadoop system like replace for your EDW? Can both system coexist in the same company?

What is a better solution for Big Data Analytics?

Read a next article!

http://www.bitpipe.com/data/demandEngage.action?resId=1373640362_622

Tuesday, October 15, 2013

Big Data infography

Shared with all this excelent infography:

Añadir leyenda

Thursday, October 3, 2013

Use hadoop or not?

In the previous not many years, Hadoop has earned a grand notoriety as the go-to big data investigation motor. To large groups, its synonymous with big data innovation. However the open source appropriated preparing structure isn't the right reply to each big data issue, and organizations looking to send it have to precisely assess when to utilize Hadoop - and when to turn to something else.

There's such a great amount of buildup around [hadoop] now that individuals suppose it does basically anything.

Kelly Stirman, chief of item promoting, 10gen Inc.

For instance, Hadoop has adequate power for handling a lot of unstructured or semi-organized data. Be that as it may it isn't known for its speed in managing more diminutive data sets. That has constrained its requisition at Metamarkets Group Inc., a San Francisco-based supplier of ongoing showcasing dissection administrations for online promoters.

Metamarkets CEO Michael Driscoll said the organization utilizes Hadoop for extensive, dispersed data preparing assignments where time isn't a requirement. That incorporates running end-of-the-day shows up for survey every day transactions or checking recorded data going back a few months.

Be that as it may concerning running the continuous dissection forms that are at the heart of what Metamarkets offers to its customers, Hadoop isn't included. Driscoll said that is since its advanced to run bunch employments that take a gander at each document in a database. It descends to a tradeoff: so as to make profound associations between data focuses, the innovation yields speed. "Utilizing Hadoop is like having a friend through correspondence," he said. "You compose a letter and send it and get a reaction back. Be that as it may its altogether different than [instant messaging] or message."

As a result of the time element, Hadoop has restricted worth in online situations where quick execution is vital, said Kelly Stirman, chief of item advertising at 10gen Inc., designer of the Mongodb Nosql database. Case in point, dissection energized online requisitions, for example item suggestion motors, depend on preparing modest measures of data rapidly. Be that as it may Hadoop can't do that productively, consistent with Stirman.

No database swap plan

A few organizations could be enticed to attempt scrapping their customary data warehouses energetic about Hadoop bunches since innovation expenses are such a great amount of more level with the open source engineering. At the same time Carl Olofson, an investigator at statistical surveying organization IDC, said that is a pieces of fruit and-oranges examination.

Increasingly ON WHEN TO USE HADOOP

Perceive how organizations are leveraging Hadoop bunches

Study why a few organizations are battling to execute Hadoop

Read this Hadoop reconciliation and usage guide

Olofson said the social databases that power most data warehouse are accustomed to pleasing trickles of data that come in at a relentless rate over a time of time, for example transaction records from regular business forms. Then again, he included, Hadoop is best suited to transforming inconceivable saves of collected data.

Also on the grounds that Hadoop is regularly utilized as a part of vast scale ventures that require bunches of servers and workers with specific modifying and data administration abilities, executions can get unmanageable, in spite of the fact that the expense for every unit of data may be lower than with social databases. "When you begin including all the expenses included, its not as modest as it appears," Olofson said.

Specific advancement aptitudes are required on the grounds that Hadoop utilizes the Mapreduce programming customizing structure, which restricted amounts of designers are acquainted with. That can make it challenging to gain entrance to data in Hadoop from SQL databases, as per Todd Goldman, VP of undertaking data reconciliation at programming merchant Informatica Corp.

Different outlets have advanced connector programming that can help move data between Hadoop frameworks and social databases. Yet Goldman feels that for numerous associations, an excessive amount of work is wanted to oblige the open source innovation. "It doesn't bode well for redo your whole corporate data structure just for Hadoop," he said.p can't do that effectively, consistent with Stirman.

Supportive, not buildup full

One sample of when to utilize Hadoop that Goldman refered to is as an arranging territory and data coordination stage for running concentrate, convert and load (ETL) capacities. That may not be as energizing a requisition as all the buildup over Hadoop appears to warrant, however Goldman said it especially bodes well when an IT section needs to union huge records. In such cases, the preparing force of Hadoop can prove to be useful.

Driscoll said Hadoop is exceptional at taking care of ETL methods on the grounds that it can part up the coordination errands around various servers in a group. He added that utilizing Hadoop to incorporate data and organize it for stacking into a data warehouse or other database could help legitimize ventures in the innovation getting its foot in the entryway for bigger activities that exploit Hadoop's versatility.

Obviously, heading edge Internet organizations, for example Google, Yahoo, Facebook and Amazon.com have been big Hadoop clients for quite some time. Furthermore new advances pointed at disposing of some of Hadoop's limits are getting accessible. For instance, numerous outlets have discharged apparatuses intended to empower constant examination of Hadoop data. Also a Hadoop 2.0 discharge that is in the works will make Mapreduce a discretionary component and empower Hadoop frameworks to run different sorts of provisions.

At last, its paramount for IT and business executives to slice through all the buildup and comprehend for themselves where Hadoop could fit in their operations. Stirman said there's probably its an influential device that can underpin numerous functional expository capacities. At the same time its as of now coming to fruition as an innovation, he included.

"There's such a great amount of buildup around it now that individuals suppose it does basically anything," Stirman said. "The actuality is that its an extremely perplexing bit of engineering that is still crude and needs a ton of consideration and taking care of to make it do something advantageous and significant." ording to Stirma