Web Technology and Consulting From a Business Point Of View

In his book “The Power of Habit”, Charles Duhigg relates a most interesting tale.  A man enters a Minneapolis area Target store and demands to talk to the manager.  The manager listens as the gentleman angrily asks why Target mailed offers and coupons for baby clothes, furniture and other maternity items to his high school-age daughter.  Embarrassed, the manager apologizes and tries to explain it must have been some kind of mistake and promises to contact Target headquarters about the matter.  A few days later, the manager calls the gentleman to apologize again.  Sheepishly, the father admits that when he returned home, he spoke to his daughter.  She was indeed pregnant.

Target’s data analysis programs predicted that she was pregnant based on the girl’s purchases. Items such as unscented shampoo and lotions, vitamins and washcloths, as well as coupons she used, and other demographic data, all added to her customer profile.  And they are right better than 70% of the time.

As we walk, bike or drive with a smart phone, GPS, or tablet device, systems track our location. When we get email, go to a website, text, tweet or Facebook, our actions are stored.  Credit card data, on line purchases, coupons and gift cards used, and much more data, is all stored and analyzed for trends, patterns and anomalies. Why? All the better to market to you, my dear!

Companies and organizations need to know a lot about you, without you specifically telling them to accurately predict which book or movie you’re likely to buy, whether a cruise or bike trip is probably in your future, or which candidate you’ll vote for.  Many organizations predict whether a hurricane will hit New Orleans or Miami, study the human genome, or guess which players to trade for best shot at the world series.  All of this in turn requires a lot of data storage and management. A lot.

Welcome to the world of big data.

How much data you ask?  To put it in perspective, let’s use something we’re all familiar with as a basis of data storage- a music CD. A commercial CD holds about 3/4 of a gigabyte, or 750 megabytes (MB) of data for about an hour of music.  Your computer’s hard drive probably stores in gigabytes, or GB.  If you have a 500 GB drive, that’s about equal to 667 or so CDs.

1,000 gigabytes equals a terabyte (TB) or about 1,430 CDs.  Wikipedia, the online encyclopedia, claims it uses 5.87 terabytes to store all of the entries.  The Library of Congress adds about 5 terabytes per month, and as of April 2011 estimate the data size of all its media at 235 terabytes.

A petabyte (PB) equals 1,000 terabytes, enough to fill 1.43 million CDs.  Now we’re into big data.  Walmart processes about a million customer transactions per hour, world wide. The download of this data comes out to 2.5 petabytes per month, or 167 times more than the media in the Library of Congress.

And what about Google?  How much data do they have?  Mountain View is very protective of this information, so no one really knows. But analysts guess it’s in the well over 200 petabytes in approximately 900,000 (that’s not a typo!) servers across the globe.

Orwell’s 1984 vision of an ever-watching ‘big brother’ doesn’t seem so far fetched in today’s wired and wireless world.  But instead of the government making you toe the line, Target wants to sell you a stroller.

Leave a comment