Wednesday, September 18, 2013

Why is transfer of lots-of-data an issue for Big Data?


This may just be a rhetorical question, but there are a few companies developing technologies that allow a massive amount of data to be transported to a DW environment where - using the latest Big Data tools - this data can be analyzed and further used within larger queries. 


One such company is Attunity (www.attunity.com) and their latest product blog post states clearly there solution allows data to boldly go where and when you need it.

What I don't understand is this: Wasn't that the original use for MapReduce that data did not have to reside in a singular DW so that I could query it? As I understand it, this was Google's motivation to build this technology since they clearly understood that all data of the web would not be located on one spot.

According to a paper from TerraData there is a growing issue in the amount of data produced by oil wells and getting that data transferred across the country or countries to a data center is expensive. I guess my simplistic way of design is to ask: Why not put together a couple of servers at the drilling site, build a Hadoop cluster remotely onsite and remotely run queries on it where it is? Isn't it harder to try to store, transport and restore the massive amount of data?

Just a question. Please comment.

No comments:

Post a Comment