Hu Yoshida

Top IT Trends for 2018 Part 2: IT Must Do More Than Store and Protect Data

Blog Post created by Hu Yoshida on Nov 24, 2017

This is part 2 of my Top IT trends for 2018. In my first post, I covered Preparing IT for IoT. This post will look at some new requirements for managing storage that will make IT more effective.


Part 2 Top trends.png


4.     Data Governance 2.0

2018 will see new challenges in data governance which will require a new data governance framework. Previous data governance was based on the processing of data and meta data. The new data governance must now consider data context and be flexible to quickly adapt as new regulations are unleashed by the regulators on new processes and data types like crypto currencies. The new data governance must now consider data context. Surely one of 2018’s biggest challenges will arrive on May 25, 2018 when the EU’s General Data Protection Regulation (GDPR) goes live and affects all countries worldwide where the processing of personal data for EU citizens occurs. GDPR gives EU residents more control of their personal data. Individual controls include the ability to prohibit data processing beyond its specified purpose for collection, the right to access, the right to rectification, the right to be forgotten, the right to data portability, the ability to withdraw consent to the collection and use of personal data, and many more.  Consider this, if an EU citizen invokes their right to be forgotten, a company must be able to find the individual’s data throughout its technology and application stacks (many of which are logically, if not physically, separated), evaluate the intent of each data element (as some regulations will likely supersede GDPR – such as financial reporting responsibilities), eradicate the data, and provide proof that the data has been eradicated to the EU citizen along with an audit log to demonstrate compliance to regulators. Responding to individual actions and enforcing individual rights can drive up costs and increase risks in collecting and storing personal data. Those costs are not limited to the working hours required to complete the requests – there are also penalties to consider. GDPR violations can cost up to €20m ($21.75m) in fines, or up to 4% of the total annual worldwide turnover of the preceding financial year.


GDPR also requires mandatory breach notifications within 72 hours to your customers. What is interesting here is the ambiguity of the term “breach”.  In IT, this word often conjures up images of clandestine or rogue groups executing various forms of network intrusion attacks to unlawfully gain access to organizational data.  However, in the eyes of GDPR, a data breach is defined as a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorized disclosure of, or access to personal data, transmitted, stored, or otherwise processed.  Consider how broadly defined that is – those “hackers” certainly fit the definition, but so does your database administrator who accidently executes a “DROP TABLE table_name” command against your CRM system.  With mandatory breach notification requirements due to your customers within 72 hours, such a short window of time can be quickly compounded without comprehensive data processing models and the appropriate checks and balances regarding data use.   It has taken months for discovery and notification of breaches in high profile cases like the Yahoo breach. The ability to do this is impossible for most simply because of a lack of data awareness – that is, when data is scattered in different application and technology silos throughout the organization, especially since more data creation is done today on the edge, on mobile devices and/or in the cloud.


To be sure, GDPR is seen as complicated to some, confusing to others, and normal to those already in highly regulated markets.  Regardless, as citizens across the globe become just as ‘digitized’ as data, it makes sense that general requirements for how data is accessed, managed, used, and governed would be formalized.  The impacts this has this audience are just as unique as the organizations you represent, but regardless, it underscores the simple need for a progressive and more intelligent data governance framework, one that allows you to oversee the data no matter where it resides, capable of being updated to include content intelligence tools that can detect and notify when breaches occur, and based on a “smarter” technology stack that can quickly adapt and respond to regulatory and market requirements.


5.     Object Storage Gets Smart

By now most IT shops have started on their digital transformation journey and the first problem that most run into is the ability to access usable data. Application and technology decisions often lock data into isolated islands where it is costly to extract it and put it to other uses. These islands were built for a specific purpose, or use-case, and not necessarily in the spirit of sharing the data. Many of these islands contain data that is duplicated, obsolete, or goes dark in that it is valid but no longer used because of changes in business process or ownership. Data scientists tell us that 80% of the effort involved in gaining analytical insight from data is the tedious work of acquiring and preparing the data. The concept of a data lake is alluring, but you can’t just pour your data into one system. Unless that data is properly cleansed, formatted and indexed or tagged with meta data so that the data lake is content aware, you end up with a data swamp.


While Object Storage can hold massive amounts of unstructured data and provide customized and extensible metadata management and search capabilities, what’s been missing is the ability for it to be contextually aware. Object Storage now has the ability to be “smart” with software that can search for and read content in multiple structured and unstructured data silos and analyze it for cleansing, formatting, and indexing. Hitachi Content Intelligence software can extract data from the silos and pump it into workflows to process it in various ways. Users of Content Intelligence can be authorized so that sensitive content is only viewed by relevant people and document security controls are not breached. It can create a standard and consistent enterprise search process across the entire IT environment by connecting to and aggregating multi-structured data across heterogeneous data silos and different locations.  Additionally, it provides automated extraction, classification, enrichment and categorization of an organization's data.


Combining my comments in section 4 with this topic, and you can quickly see the foundational components that you can deploy to address regulatory and compliance related obligations with a high-scalable, performant, adaptable, and intelligent Object Storage foundation from Hitachi Vantara.  Of course this platform is not limited to how data is managed and governed.


In a recent evaluation of a complex stroke CT case study, a custom MatLab DICOM parsing script was written to perform the filtering and extraction of DICOM tag data, a process that took 50 hours. Using a Hitachi Content Intelligence DICOM processing stage and the same medical image data, the query time was reduced to 5 minutes. This amounts to a 99.8% performance increase in being able to analyze the CT cases


My next post on IT Trends will talk about new data types that IT will start addressing in 2018.