Paul Lewis

You don’t have a Big Data problem

Blog Post created by Paul Lewis on Oct 8, 2013

Don’t get me wrong, you have problems, BIG problems. And many of them have to do with your data.  Failed backups, silos of heterogeneous storage, difficulty keeping up with compliance requests, and even simply understanding the types and purpose of data you store…just to name a few (several).  Fortunately what you almost certainly don’t have is a Big Data problem…but I know you might think you do.

 

(read below with a slight southern accent)

 

“If you have a single application that creates 1TB of data annually….you don’t have a Big Data problem”

 

“If you only handle hundreds of transactions a day….it’s unlikely that you have a Big Data problem”

 

“If you only need a few queries to access your data from an existing warehouse…you probably don’t have a Big Data problem”

 

“If the applications you built or bought is your only source of data you store....there is no lean toward Big Data”

 

“If you believe your Intellectual Property is your most important asset, and sits exclusively in your application code....walk away…and stop talking about your Big Data problem”

 

“If your home has more miles than your car…”  Oops, wrong list.

 

I’ll even go one step further.  If you answer “Yes” to any of the questions below, you do not have a Big Data problem:

 

  • Can your existing MIS team implement all of your Information Management projects?
  • Will all of your existing BI tools handle creating the types of information projects you need?
  • Do you have a handle on all the questions you need answers to?  All the reports you need to create?
  • Is “overnight” a perfectly valid amount of time to wait for a report?
  • Is all of your data is created by you, well defined and understood?

 

If you checked “Yes” to any of the above,  end of post.  You can now relax. Have a good day.

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

Congratulations to everyone remaining (several I assume), you have passed the test and have not been filtered out of this problem set.

 

Let’s be sure though, you never know…it’s possible that some readers continued to read, ignoring of my explicit direction.

 

Big data, like Cloud, is not an implementation of technology nor has an explicit set of solutions.  Big Data defines a class of information management problems that are difficult or impossible to efficiently solve using conventional tools and techniques; and are often characterized by the 5 V’s: Volume, Velocity, Variety, Veracity, and Value.  Each of the V’s qualifies your current set of problems, and therefore collectively or individually these statements are true within your company:

 

  • The VOLUME of data you are collecting overwhelms the people, processes and technical capabilities in your Enterprise Information Management group. You will likely need to find a distinctually (not a word but should be) different set of tools and techniques to solve this problem
  • The data arrives so fast (VELOCITY), that you can’t just write it all to a database.  You may also have a need to make a decision based on the incoming data, and currently have no automation to support those rules
  • You wish you just had a few RDMS deployments, but you currently collect end user documents, real time sensor content from millions of machines, surveillance information from retail stores, and voice recordings from thousands of customer requests for new products.  Not only do you have a significant VARIETY of data, you are expected to correlate that information to understand  the sentiment of a specific class of customers
  • Some parts of the your organization expressed doubts concerning VERACITY of the facial recognition in the video surveillance data.  Storing the video files are easy, properly assigning the recognition facial meta data, is much more difficult
  • VALUE of your Intellectual Property, your vast  VOLUME and VARIETY of data is presumed but not known.  More time will be spent on defining what questions you should be trying to ask, versus assigning resources to answer them.

 

Now that we have agreed that its “possible” that you may have a set of problems that fall in the definition of Big Data, how can you solve these problems.  I have three answers:

 

  1. It’s complex, but Hitachi can help, call us.  Or call me personally. I’m in the book.
  2. Deployment reference architectures will come in handy, and are readily available, and likely for your specific industry.  For Big Data, technical components within a reference architecture would include: infrastructure elements deployed with Cloud characteristics, structured and unstructured sources, ETL, real time streams, real time processing, real time structured databases, interactive analytics, batch processing, and data visualization.  There are many individual and specific software and appliance products to fulfill on one or more of these components.
  3. Evaluate fully converged, Big Data workload specific solutions.  Within the Hitachi product lines, they would include Unified Compute (UCP) combining Unified Storage (HUS, HUSVM, VSP, NAS, HCP) with real time processing and analytics toolsets pre-installed, and performance tuned for your specific needs (ie SAP, Cloudera, Microsoft, etc).

 

But where do you start?  See option 1.  You too, can now relax.

Outcomes