Paul Lewis

Understanding Data: Gold Nuggets and Puzzle Pieces

Blog Post created by Paul Lewis on Mar 7, 2017

1.jpg

I regularly use the colloquial phrase “nuggets of gold in a huge pot” when describing the value obtained from understanding and analysing data.

 

It seems like an easy win.  The phrase is well known and highly digestible.  Most people in the audience generally appreciate that gold has immense value, and there are whole industries that exist to mine this precious metal from a variety of mountains and streams.  It’s also predictable that as you collect these precious nuggets, you won’t be able to carry them around given their collective weight, and a pot is as good as anything to store them.  Plus the whole leprechaun-esque vision it likely creates might bury the phrase in long term memory for easy recall the next day with colleagues. Like “I went to a seminar yesterday and this dude talked about value derived from analytics as being like nuggets of gold in a huge pot”.  That’s helpful.

 

Occasionally, like this blog, I even blog about it.  I find repetition to be tremendously valuable in retaining content. Additionally, I also find repetition to be tremendously valuable in retaining content.  (Note: embedding subliminal messages in repetitive statements is also tremendously valuable but I will get to that content later, trust me, you won’t object)

 

Unfortunately, as metaphors go, it’s extremely weak (especially considering pots are much more likely to hold coins versus nuggets).  Let me break it down so you see what I mean:

 

  • Data has value the instant it’s created, for as long as you hold it, until its demise.  Inarguable.
  • The final form of data could be deletion or decade-old archiving, the effect is the same.  True.
  • The value of data changes over time.  Sure.
  • Adding new data to existing data, more opportunity is created to discover a potentially endless series of value.  Potentially.
  • This potential value could be expressed as an undetermined number of “nuggets of gold”.  I guess, if you must.
  • The more data you have, the more nuggets of gold you could discover, and the more necessary a pot to hold them.  That’s a stretch.
  • The more data you have, the more precise you’re statistical and mathematical models and more opportunity you will have to find more nuggets.  Don’t buy it, sounds complex.

 

Getting the picture?

 

The fundamental problem with the metaphor is that I’m treating value-obtained as a direct representation of data-collected.  I.e. you are storing various elements of a client, therefore hidden in one or more of elements is a single purposeful and valuable answer, hidden in the fields, row and columns:

 

  • Data, in the sense of a database, being a single field, in a single row, in a single column, is irrelevant.  It carries no weight or value beyond the knowledge of collection.  It lacks context and awareness.  Whether static or variable, it tells no story and solves no problem.
  • Data, in the sense of unstructured data, bytes of binary information carry even less value.  In fact, knowing that a single bit is only a small part of a greater whole, pre determines its unlikeliness to impact the entire picture.
  • Data, as a single point in time from a stream of information is outdated the very nanosecond it’s used, as more current data takes its place, creating a new current reality.

 

The concept of “nuggets of gold” by extension then presumes a specific and direct answer to a question; or a direct and obvious correlation to an action:

  • How many toothpicks are in the container?  173
  • What colour shirt matches best with my red pants? None, don’t wear red pants
  • What’s the name of that dude with the crazy beard in that class last year?  Henry. For the last time HENRY!
  • If you were to spend $5 less, you would have an extra $5 in the bank
  • If we mix these two primary colours, you would have this one secondary colour
  • If I build more of this product, I will sell more of this product

 

Lesson learned: Individual elements of data possess little to no value.

 

There is a reason why every company (including yours) has an EIM program and a CDO (Chief Data Officer) responsible for stewardship of your most precious technological asset, data.  As a reminder, Enterprise information management (EIM) is an integrative discipline for structuring, describing and governing information assets across organizational and technological boundaries to improve efficiency, promote transparency and enable business insight.  The program includes capabilities to store, protect, architect, manage risk and compliance, manage quality, classify and organize data.  A great EIM program focuses on how organizations derive insight and value from information, either from internal effectiveness and/or growth oriented goals and activities.

 

A CDO, or VP of Business Intelligence or the Manager of MIS understands that data, in its element form, does NOT equal value.  They understand that value is derived from discovering patterns, appreciating impact of change and time, and that data requires enrichment not just discovery.  The activity required to derive value is implemented in four capabilities:

 

  • Descriptive: MIS (Management Information Systems) or Reporting, focusing on hindsight (what has happened)
  • Diagnostic: Business Intelligence or Incident Management, focusing on current state insight or understanding “why” it happened
  • Predictive: Analytics combining models of previous data and application to new data, focusing on foresight (what will happen)
  • Prescriptive: Analytics and Action, foresight algorithms to implement a business function

 

2.png

 

The EIM program also appreciated that the effort to create value focuses FAR LESS on finding a long lost and specific piece of data, and instead focused on studying patterns in static, changing and moving information and researching correlations, causations and theoretical application of mathematics and logic to CREATE complex business value from data-centric components.  Yes, it’s a science.  Its far less, searching for a nugget of gold, and far more about determining that you could make money from gold jewelry….all from the same mine.

 

So here is my NEW metaphor, and for the sake of inconsistency, I’m not even going to use precious metals:

 

3.jpg

Imagine a pile of random puzzle pieces.  Each piece represents a single data point, collected from a variety of sources.

 

Before value can be obtained, preparatory activity is needed to curate and enrich data:

  • Extraction: Identify all the puzzle pieces in the house: under beds, in vacuum cleaners, in the dog bowl etc.  For data, discover all the sources of information: internally and externally, structured and unstructured, and classify.
  • Integration: Send out all the kids and parents to grab the pieces and bring them back to the pile.  For data, connect to hundreds of sources for batch or real-time integration/ETL.
  • Enhancement and Cleansing: Dust off each piece, glue back down the picture side, sharpen the edges, number the backs.  For data, match and qualify, and add appropriate metadata.

 

This effort to convert raw data to CONTENT, and indescribable fields into describable OBJECTS, requires the capabilities of more than just a pile, a box of sorts.

 

4.jpg

 

A CONTENT platform (the box) allows organizations to bring together object storage (a place to put ALL data), data mobility (a means to abstract data from its sources), cloud gateways (ability to use multiple deployment models), and metadata tagging and sophisticated search to create a tightly integrated, simple and smart data intelligence solution.   You may have heard this being referred to as a Data Lake.  I HIGHLY recommend this solution set, if you happen to be in the market:  https://www.hds.com/en-us/products-solutions/storage/content-platform.html

 

For this new enhanced data set (puzzle pieces), contained in a CONTENT platform (puzzle box), the EIM value creation activities can be described (still the goal to find the Picasso):

 

  • Descriptive:  create a list of puzzle pieces, organized by shape/colour/origin; determine which pieces closely resemble the palette of a master work of art
  • Diagnostic: visualize the current state of completing the puzzle; how far along is the process  *and/or* discover missing pieces
  • Predictive: given where we are in the process, and the remaining pieces still in the box, determine what picture we might be making *and/or* predict what might be the picture even if we have missing pieces
  • Prescriptive: After having made dozens of pictures from these same puzzle pieces, guide the creation of existing and new completed puzzles

 

Both predictive and prescriptive analytics would use linear and non-linear algorithms (ways of thinking out the problem), would focus as much on the puzzle pieces that exist, and the ones that are missing, and combine/use pieces from hundreds of potential sources to create hundreds of different works of art.

 

5.jpg

 

In a nutshell: The VALUE obtained from understanding and analysing data, is not that you will find “nuggets of gold” of DATA, or an individual puzzle piece that solve the problem.  The VALUE obtained from understanding and analysing data, is the millions of dollars in your bank account from building several masterpieces from all your individual puzzle pieces. 

 

And I have over a $ BILLION reasons (Press Release) why this makes sense, and all of your peers are focused on creating this capability including Spinmaster (www.spinmaster.com) who uses the Hitachi Content Platform portfolio to deliver business insights and governance from any cloud, device or location and Thinkon (/http://www.thinkon.com/data-archiving/hitachi-content-platform/) who launched several pay-as-you-go services that help companies protect their critical data assets and optimize their application infrastructure costs.  Hitachi Content Platform delivers unique value to customers in that it offers a tightly integrated ecosystem of solutions to enable customers’ greater business value and benefits. Data has become the lifeblood of every organization in every industry, whether it is employee data, customer information, internal communications, intellectual property, machine data or research, the ability to ensure control, visibility, governance, collaboration, accessibility and analysis is paramount.

 

Indeed.

Outcomes