Skip navigation
1 2 3 Previous Next

Hu's Place

194 posts


A recent analyst report makes the following claim


“Solid-state arrays have reached a level of maturity, reliability and cost-effectiveness that they now exceed hybrid disk arrays in all characteristics except raw capacity, providing I&O leaders with agility and service improvements.”


Most solid-state arrays are architected as midrange systems which lack enterprise capabilities, like virtualisation of external storage, remote replication, and active/active controllers. An Industry analyst chooses to define Solid State Arrays as “All Flash Arrays” (AFA) which means that they cannot attach hard disks either directly or through virtualization and must have a separate product number. They fail to recognize that a hybrid disk array like the enterprise Hitachi VSP G series can do everything and more that an “AFA” can do. The VSP G series can contain just flash modules if the customer decides to use it as an all flash array but it does not qualify as an AFA according to this definition. Customers who follow this definition often do not consider the VSP G series as an AFA. For this reason, we have created the VSP F Series which can only attach Hitachi Flash Modules (FMD). The F series cannot virtualize external storage arrays.


The logic behind the AFA definition has been questioned by other industry analysts like Howard Marks, but it is what it is.


As far as the analyst statement goes, our VSP F series AFA has the same level of maturity and reliability of our hybrid VSP G series since they have the same basic architecture. However, since the F series is restricted from virtualizing   external storage, this restricts its ability to match the G series in cost effectiveness and raw capacity. Other AFA vendors cannot match either the VSP G or F series in enterprise scalability, availability, and functionality and many cannot match the raw performance. Both the G and F series offer the industries only 100% availability guarantee.


Why is virtualization of external storage arrays so important? In that same report the analyst says:

By 2020, the percentage of data centers that will use only SSAs for primary data, instead of hybrid arrays, will increase from less than 1% in 2015 to 25%”. That means there will still be a lot of disk storage systems around that could gain the performance and operational benefits of flash storage if they were virtualized behind a front-end hybrid flash system like the VSP G series.


Most of our customers still choose the hybrid VSP G series over the F series due to the flexibility that the hybrid approach provides in implementing flash. An example today is the current supply constraints around SSD’s brought about by NAND technology transitions, factory yields, and competing memory demands by smart phones and PCs. Hybrid storage array customers can fill their storage demand needs with available disk storage as a tier behind their existing flash.


The future is certainly all flash, but during the transition, a hybrid storage system, can help by providing the benefits of flash through virtualization before the before the disk systems are fully displaced. In the future, when new technology replaces NAND flash, a hybrid storage system will again help in the transition as it does today.

This past week I visited several customers in different business verticals. One was an energy and power company, another was an insurance company and the others were in media and entertainment. A common theme in all these meetings was digital transformation and the disruption it is causing in all these sectors. The disruption was not so much about the technology, but more about the disruption in their business models by new startups in their industry. Each meeting also provided insight into the value of IoT


Energy and Power

Energy DX.png

In the energy and power vertical, the old centralized model of a huge, capital intensive, energy generation plant with a huge transmission grid located far from their consumers is giving way to a distributed model where energy is being generated from renewable sources close to the consumer in micro grids. Many of these large energy generation plants are being used at a fraction of their capacity making it difficult to generate revenue for established power companies. While renewable sources of energy like solar and wind are clean, they are subject to the vagaries of weather, and need new technologies around IoT, big data, and predictive analytics to provide reliable power. With deregulation new competitors are entering the utility business. Some municipalities are setting up their own mini-grids sharing power generated from solar panels on residences and businesses. While power companies may have installed smart meters to reduce the cost of meter readers, they have not yet tapped the real potential of that connection into the home to manage home devices. See how Hitachi is working with Island of Maui in the state of Hawaii, to Solve Energy Problems in Hawaii Hitachi's Smart Grid Demonstration Project by connecting residential solar panels with wind farms and electric vehicles.



In the insurance business there are many new opportunities which has given rise to a new class of Fintechs, that are classified as Insurtechs.


Like their banking counterparts, many of the insurance companies are partnering with these Insurtech companies to drive innovation. The particular company that I visited had just created a unit to drive digital innovation across its businesses and their CIO was promoted to head up this unit as the Chief Technology Innovation Officer. The group is tasked with identifying emerging customer segments; leveraging data and analytics; and helping the company’s distribution partners innovate and strategically grow their businesses. They are already partnering with several Insurtech companies. They also are subject to many regulations including GDPR, which I covered in a previous post. Insurance companies are also connecting with IoT. Real-time data from embedded sensors in a wide range of internet-connected devices like automobiles and Fitbits, along with advanced analytics, will allow insurers to offer many new and enhanced products and services. Read this link to see how Hitachi is creating new IoT driven Insurance services that can reform insurance services from being reactive, responding to accidents and other incidents after the fact, to proactive, which provides services in advance.


Media and Entertainment

The media entertainment business is also undergoing a massive change. Twenty years ago the definition of broadcast and entertainment was very simple. Content came from the establishment and sent in one direction. Today’s social and video networks are changing the entire business model of content, publishers, broadcasting, entertainment, news, advertising and digital rights. The online and mobile ecosystem also changes how content reaches viewers, and on-demand viewing has made the fixed, mediated schedule of linear programming seem obsolete. For example, to deliver contextual advertising you need to truly understand your audience. This starts with combining a lot of different data: Information about your viewer, demographics, etc; their viewing patterns; and metadata about the content they are viewing. This data comes from many sources, in different formats, and delivery mechanisms. A data integration tool like Pentaho will help to provide market insights, operational insight and risk insights.

Hitachi TV.png

The infrastructure is also undergoing major changes Streaming video completely bypasses the traditional video-aggregation and distribution models around broadcast networks, cable, and satellite—disrupting long-standing value chains and dedicated infrastructure (for example, broadcast towers, cable lines, and satellites) that have historically been critical to the television industry. Hitachi also has vertical expertise which can help with solutions on the operational side. MediaCorp Pte Ltd, Singapore's leading media company overcomes transmission challenges using Digital Microwave Link from Hitachi.


An IoT Approach to Digital Transformation

In each of these business there was a very heavy IT focus on data integration and analytics. But there was also a need for OT focus as well when it comes to transforming the business model. This is where Hitachi has an advantage with its expertise in IT and OT and the capability to integrate this into IoT. While I just met with three different verticals, I expect that the same would be applicable in other verticals.

In the past I have blogged about Fintech companies that are disrupting the financial markets, by offering banking services with more consumer efficiency, lower costs, and greater speed than traditional financial companies. Fintechs have been a catalyst for traditional banks to search for solutions to automate their services in order to compete. In fact, financial companies are embracing the fintechs and the real competition is with technology companies!


Fintech city.png

In an Interview with Digital News Asia, Sopnendu Mohanty, chief fintech officer at the Monetary Authority of Singapore (MAS) is quoted:


When people talk about fintech, the natural understanding is a technology company doing banking, That’s the classic definition of fintech, but the reality is something different – 80% of fintech [startups] … are actually helping banks to digitize, challenging current processes and technology. They are actually disrupting the large tech players in a way that the banks’ technology expenses are getting smaller, while they are getting better customer services,”


“What is visible to consumers is the disruption to financial services, but the real disruption is happening to the IBMs of the world,” said MAS’ Sopnendu…. “Fintech companies have not exactly created new financial products, they are still moving money from A to B, they are still lending money, a classic banking product,” he said.


“What has changed is the distribution process and customer experience through technology and architecture.”


A CIO of a bank in Asia showed me a mobile app that they developed to apply for loans. It was very easy to use. However, processing the loan still took over a week because the back end systems were still the same. There are a number of fintech companies that could process that loan application in a few days at lower cost, and make it available in smaller amounts to a farmer or a pedi-cab driver whom the larger banks could not afford to service.


Banks have one major problem to overcome. How do they disengage from the legacy technology upon which they have built all their core processes, in order to adopt the agile fintech technologies and architectures? Many remember the painful, drawn out process of converting their core financial systems from monolithic mainframes to open systems. There are many banks that still use mainframes for legacy apps.


The way that the banks transitioned from mainframes to open systems required a bi-modal approach, modernizing their mode 1 core systems while transitioning to the new mode 2 architectures. Some out sourced that transition. The same approaches are needed today and fintechs can help the banks retain their customers during this transition.


Another element in this transition are the regulators and there is a class of fintech called regtechs, who are applying some of the same technologies to automate compliance with regulations. Machine learning, biometrics and the interpretation of social media and other unstructured data are some of the technologies being applied by these startups.


Technology vendors must become proficient in these technologies and move beyond that to AI, block chain, and IoT. In addition, they must provide technology that can bridge and integrate the mode 1 with mode 2 infrastructure, data, and information. Tools like converged and hyper-converged platforms, object stores with content intelligence, and ETL (Extract, Translate, and Load) automation. 


Technology companies should not be looking at each other for competition, they need to be looking at the Fintechs, and understanding how they have innovated in bringing business and technology together.


Hitachi established a Fintech lab in Santa Clara last year to work with customers and partners and was a founding member of the open source Hyperledger project started in December 2015 to support open source blockchain distributed ledgers and related tools. Our work on the Lumada IoT platform will also help us to compete in new technology areas like AI.

The Hitachi Data Systems’ object based storage (OBS) solution HCP (Hitachi Content Platform) has earned another analyst assessment as a leader in the Object Based Storage (OBS) Market. In addition to being named a leader in IDC’s MarketScape: Worldwide Object-Based Storage 2016 Vendor Assessment and Gartner’s 2016 Critical Capabilities For Object Storage Report, HCP was ranked number 1 in the March 2017 GigaOm Sector Roadmap: Object storage for enterprise capacity-driven workloads. This latest report has HDS widening the gap with the other vendors in the OBS market with a score of 4.5 while the other 8 closest vendors are clustered together in a range from 3.3 to 3.8.

GigaOm HCP.png


Interpreting the Chart

The GigaOm chart IS:

  • • An indication of a company’s relative strength across all vectors (based on the score/number in the chart)
  • • An indication of a company’s relative strength along an individual vector (based on the size of ball in the chart)
  • • A visualization of the relative importance of each of the key Disruption Vectors that GigaOm has identified for the enterprise object storage marketplace. They have weighted the Disruption Vectors in terms of their relative importance to one another.


GigaOm is a technology research and analysis firm. They describe themselves as being “forward-leaning, with a futures-oriented take on the trends and tools that are shaping the economy of the 21st century: Cloud, Data, Mobile, Work Futures, and the Internet of Things.”  GigaOm reaches over 6.5 million monthly unique readers, with a mobile reach of over 2 million monthly visitors. This report was authored by Enrico Signoretti, a thought leader in the storage industry, a trusted advisor and founder of Juku Consulting.


This report is a stand-alone assessment on object store and does not review other products included in the HCP Portfolio, such as HDI, HCP Anywhere, and Hitachi Content Intelligence.  The report is a vendor-level analysis that examines an expanding segment of the market—object storage for secondary and capacity-driven workloads in the enterprise—by reviewing major vendors, forward-looking solutions, and outsiders along with the primary use cases.  Vendors covered in the report include: Scality, SwiftStack, EMC ECS, RedHat Ceph, HDS HCP, NetApp StorageGRID Webscale, Cloudian, Caringo, DDN, and HGST.


The heaviest weighted disruptive vector, identified at 30%, was the Core. This refers to the Core Architecture. According to GigaOm: “Most of the basic features are common to all object storage systems, but the back-end architecture is fundamental when it comes to overall performance and scalability. Some products available in the market have a better scalability record and show more flexibility than others when it comes to configuration topology, flexibility of data protection schemes, tiering capabilities, multi-tenancy, metadata handling, and resource management. Some of these characteristics are not very relevant to enterprise use cases, especially when the size of the infrastructure is less than 1PB in capacity or built out of a few nodes. However, in the long term, every infrastructure is expected to grow and serve more applications and workload types.


Core Architecture is what distinguishes HCP from all the rest. The core architecture has enabled Hitachi Data Systems to expand the HCP portfolio, with Hitachi Data Ingestor (HDI) an elastic- scale, backup-free cloud file server with advanced storage and data management capabilities, HCP Anywhere a simple, secure and smart file-sync-and-share solution, Hitachi Content Intelligence (HCI) software that automates the extraction, classification, enrichment and categorization of data residing on both HDS and third-party repositories.


These are valuable capabilities that Hitachi Data Systems has built on their HCP object storage based system which are not included in the OBS evaluations by the different analysts. Last week I blogged about the value of a centralized data hub and the importance of cleansing and correcting the data as you ingest it; moving data quality upstream and embedding it into the business process, rather than trying to catch flawed data downstream and then attempting to resolve the flaw in all the different applications that are used by other people. The Hitachi Content Intelligence software can help you cleanse and correct data that you are ingesting into the HCP object based storage. This is a capability that is not available in other OBS.


If you consider the breadth and depth of the HCP portfolio rather than just the object storage traits that are common to all OBS products, you will realize the full power of this portfolio. While GigaOm does not consider HDI, HCP Anywhere, or Hitachi Content Intelligence, directly in its disruption vectors, they do point out that;


“Core architecture choices are not only important for scalability or performance. With better overall design, it is easier for the vendor to implement additional features aimed at improving the platform and thus the user experience as well.


This what Hitachi Data Systems has been able to do with HCP. Stay tuned for additional features and functions in the HCP portfolio as we continue to evolve to meet new market challenges.

In my trends for 2017, I called out the movement to a centralized data hub for better management, protection and governance of an organization’s data.



“2017 The year of the Rooster teaches the lessons of order, scrutiny and strategic planning.”


Data is exploding and coming in from different sources as we integrate IT and OT, and data is becoming more valuable as we find ways to correlate data from different sources to gain more insight, or we repurpose old data for new revenue opportunities. Data can also be a liability if it is flawed, accessed by the wrong people, is exposed, or is lost, especially if we are holding that data in trust for our customers or partners. Data is our crown jewels, but how can we be good stewards of our data if we don’t know where it is: on some one’s mobile device, an application silo, an orphan copy, or somewhere in the cloud? How can we provide governance for that data without a way to prove immutability, and show the auditors who accessed it when, and how can we show that the data was destroyed?


For these reasons, we see more organizations creating a centralized data hub for better management, protection and governance of their data. This centralized data hub will need to be an object store that can scale beyond the limitations of file systems, ingest data from different sources, cleanse that data, provide secure multi-tenancy, with extensible meta data that can provide search and governance across public and private clouds and mobile devices. Scalability, security, data protection and long term retention will be major considerations. Backups will be impractical and will need to be eliminated through replication and versioning of updates. An additional layer of Content Intelligence, can connect and aggregate data, transforming and enriching data as it’s processed, and centralize the results for authorized users to access. Hitachi’s content platform, (HCP) with Hitachi Content Intelligence (HCI) can provide a centralised, object data hub with seamlessly integrated cloud-file gateway, enterprise file synchronization and sharing, and big data exploration and analytics.


Creating a centralized data hub starts with the ingestion of data which includes the elimination of digital debris and the cleansing of flawed data. Studies have shown that 69% of information being retained by companies was, in effect, “data debris,” information having no current business or legal value. Other studies have shown that 76% of flaws in organizational data are due to poor data entry by employees. It is much better to move data quality upstream and embed it into the business process, rather than trying to catch flawed data downstream and then attempting to resolve the flaw in all the different applications that are used by other people. The Hitachi Content Intelligence software can help you cleanse and correct data that you are ingesting and apply it to the aggregate index (leaving the source data in its original state), or apply the cleansing permanently, when the intent of ingest and processing is to centralize the data on an HCP via write operations.


When data is written to the Hitachi Content platform; it is encrypted, single instance stored with safe multitenancy, with system and custom metadata, and replicated for availability. The data is now centralized for ease of management and governance. RESTful interfaces enable connection to private and public clouds. HCP Anywhere and Hitachi Data Ingestor provide control of mobility and portability to mobile and edge devices. Hitachi Content Intelligence can explore, detect, and respond to data queries.


HCO suite.png


Scott Baker our Senior Director for Emerging Business Portfolio recently did a webcast about the use of this HCP suite of products in support of GDPR (General Data Protection Regulation) which is due to be implemented by May 25, 2018 and will have a major impact on organizations that do business with EU countries. The Transparency and privacy requirements of GDPR cannot be managed when data is spread across silos of technology and workflows. (You can see this webcast at this link on BrightTalk)


In this webcast he gave a use case of how Rabo Bank used this approach to consolidate multiple data sources to monitor communications for regulatory compliance.

Rabo architecture.png

Rabo Bank is subject to a wide range of strict government regulations and penalties for non-compliance over various jurisdictions with too many independently managed data silos, including emails, voice and instant messaging and some data stored on tape. The compliance team was reliant on IT for investigations which limited their ability to respond and make iterative queries. Regulatory costs were soaring due to the resources required to carry out data investigations across silos. The results of implementing the HCP suite of products are shown in the slide below.

Rabo Results.png

For more information on this use case for a centralized data hub you can link to this PDF

fake news.jpg


Competitors who lack the engineering capability to design their own flash devices use standard SSDs that were designed for the high volume commodity PC and server markets. These competitors are trying to create FUD around our purpose built enterprise flash module, FMD, claiming that the offload process causes significant backplane issues and the loss of an FMD would impact performance and cause other issues due to the design of our offering. This is utter nonsense and comes under the category of “fake news


Hitachi has taken a very different approach to flash storage devices. Unlike other flash array vendors that use standard SSD’s, Hitachi has built their own flash module (FMD) from scratch to integrate the best available flash controller technology into the storage portfolio. While Hitachi does support SSDs, they recognized the opportunity to deliver higher capacity flash drives with advanced performance, resiliency, and offload capabilities beyond what SSDs provide by developing our own flash module, with a custom flash controller.


Industry Analyst, George Crump, stated in one of his blogs: "With flash memory becoming the standard in enterprise SSD systems, users need to look more closely at the flash controller architectures of these products as a way to evaluate them. The flash controller manages a number of functions specific to this technology that are central to data integrity and overall read/write operations. Aside from system reliability, poor controller design can impact throughput, latency and IOPS more than any other system component. Given the importance of performance to SSD systems, flash controller functionality should be a primary focus when comparing different manufacturers’ systems"


Gartner’s August 2016 report (ID G00299673) on Critical Capabilities for Solid-State Arrays recognized this as a trend “To gain increased density and performance, an increasing number of vendors (e.g., Hitachi Data Systems, IBM, Pure Storage and Violin Memory) have created their own flash NAND boards, instead of using industry-standard solid-state drives (SSDs). Dedicated hardware engineering has reappeared as a differentiator to industry-standard components, moving away from the previous decade's trend of compressed differentiation”. Hitachi Data Systems started this trend in 2012 with the announcement of our first FMD.


Competitors who lack the engineering capability to design their own flash devices use standard SSDs that were designed for the high volume commodity PC and server markets. These competitors are trying to fight this trend by creating FUD around our FMD, claiming that the offload process causes significant backplane issues and the loss of an FMD would impact performance and cause other issues due to the design of our offering. This is utter nonsense and comes under the category of “fake news


In the first place, the only function that is off loaded from the storage controller to the FMD is data compression which is handled by two coprocessors in the FMD and has no impact on performance. Compare this to the software overhead of doing the compression/decompression for selected SSDs in the storage array controller versus doing this in hardware in the FMD. Because of the performance impact of doing compression in the storage array controllers, storage administrators have the added task of managing  the use of compression. With the FMD you can turn on the compression and forget it. Aside from the reporting of audit log and normal usage information, there is no significant side band communication between the Hitachi VSP storage controller and the FMDs so the claim that the offload process causes significant backplane issues is completely false.


The management of a flash device is very compute and bandwidth intensive. SSDs rely on a controller to manage the performance, resilience and durability of the flash device. As the I/O activity increases, these functions cause controller bottlenecks, sporadic response times, and a poor customer experience, causing IT organizations increasing work in managing the tradeoffs between workloads and data placement on different SSDs. We have published a white paper that explains the technology, and the pitfalls of trying to do this with the limited architecture and compute power of standard SSDs. You can down load this white paper via this link.


FMD Controller.png


The brains of the FMD is a custom-designed ASIC featuring a quad core processor, 32 parallel paths to the flash memory, and 8 lanes of PCIe v2.0 connection to the external SAS target mode controllers. This ASIC is composed of more than 60 million gates, two coprocessors for compression and decompression, and direct memory access (DMA) assist. The ASIC compute power drives up to eight times more channels and 16 times more NAND packages than typical 2.5 inch SSDs. This powerful ASIC enables the FMD to avoid the limited capabilities of the standard SSD controllers, which restricts the amount of flash that an SSD drive can manage and forces the flash array controller and storage administrator to do more management in the placement of data on SSD drives. The FMD ASIC enables Hitachi Data Systems to deliver capacities of up to 14 TB in one FMD today.


Unlike standard SSDs, the FMD was purpose built for enterprise storage workloads. It was specifically designed to address intensive large-block random write I/O and streams of sequential write requests from applications such as software as a service, large scale transaction processing, online transaction processing (OLTP) databases, and online analytic processing(OLAP). It also understands that storage array controllers format drives by writing zeros, so the FMD avoids writing the zeros, to improve performance and durability. It can also erase all the cells, even the spare cells that the array controller cannot see and report back the status of the erased cells through a SCSI read long command for auditing purposes. SSD storage arrays have no way to securely erase all the flash cells in an SSD since they cannot see the spare cells and overwrites are always done to a new cell.


As far as the loss of an FMD having an impact on performance there are two cases, a planned threshold copy-replacement and an unplanned RAID reconstruction. With the copy-replacement, the simple copy places a minor load on the FMD, but there is no impact to host I/O. There is an impact for the standard RAID reconstruction, but that is the same for any storage array, except that higher performance of the FMD could shorten the reconstruction time, depending on what else is happening in the array.


To get the true news on Hitachi Data System’s FMD please read our white paper which explains it all.

It has been about a year since I blogged about the distributed ledger technology called blockchain and the establishment of the Hitachi Financial Innovation Lab in Santa Clara.


This year Hitachi has partnered with Tech Bureau to use the NEM-based Mijin Blockchain platform for Hitachi’s point management solution “PointInfinity” which serve 150 million members and users. PointInfinity is a rewards system that allows you to earn points at one place and use them in another place similar to Plenti in the United States. Pointinfinity makes use of data-driven results and application know-how which allows merchants to deploy point-based and electronic money management systems. Members can design their own membership programs and Point of Sale (PoS) software for loyalty programs and special offers. It is a low-cost solution and can be implemented in a short period of time, while offering a high level of security. Hitachi’s PointInfinity system has gained immense popularity over the past few years by service providers and dealers that utilize loyalty based programs for frequent customers. It is the most popular points reward system in Japan with the largest number of users.


The test began on 9th February, and will progress with the goal of determining whether a private blockchain could meet the demands of a high-volume transaction system. In statements, Tech Bureau CEO Takao Asayama said that he believes the trial could help boost the perception of what he believes could be a powerful use case in Japan, one that he believes appeals to customers and corporates. Besides speeding the exchange of points between its members blockchain technology is projected to reduce running costs by over 90% according to Tech Bureau executives.


Ramen shop.png

Mijin is software that can build a private blockchain site for use within a business or inter-business, from the standpoint of the cloud or within the data center of one’s company. A product of Tech Bureau, Mijin blockchain’s applicable functionality has been proven as a bank, a microfinance ledger and an electronic money core banking system. Its NEM cryptocurrency have proved themselves in Japan as counter coinage and more than 300 companies already deal with it. Now this will be used with Hitachi’s PointInfinity points program,


The successful completion of this test will see one of the largest operational deployments of blockchain technology with the 150 million members of PointInfinity.


While Hitachi is well known for bullet trains and Enterprise IT systems, Hitachi Innovation is also leading the way in other business areas. This project with Pointinfinity and Tech Bureau helps with opening new ventures that deal with membership based on points and providing merchants with more services and rewards due to the engagement of a larger audience. The goal of this experiment enables the expansion of business from buyers because the points get used as currency, and several new services are made possible because of this development.


While this project uses blockchain technology from Tech Bureau, Hitachi has been investing in their own blockchain development and implementation. Hitachi is a founding member of the Hyperledger Consortium, an open source collaborative effort created to advance cross-industry blockchain technologies. It is a global collaboration, hosted by The Linux Foundation, including leaders in finance, banking, IoT, supply chain, manufacturing and technology. Hitachi is also working with several companies on blockchain proof of concepts. One of these is a POC with the largest bank in Japan, Bank of Tokyo-Mitsubishi, testing a blockchain-based infrastructure that they developed to issue, transfer, collect, and clear electronic checks. The POC is being done in the Singapore Fintech friendly regulatory sandbox that was installed by the Monetary Authority of Singapore.


As the year progresses I will bring you updates on Hitachi's progress with blockchain. Some of the links provided in this post are in Japanese and will require Google translate.

IEP stands for the Intercity Express Programme, which will convert the current 60+ year old diesel UK intercity trains to state of the art, high speed electric trains. The new trains are from Hitachi and are packed with the latest technologies based on Hitachi Ai Technology/H to create a cleaner, safer, smarter transport system for passengers, workers, and residents along the rail corridors.


Hitachi IEP.jpg


This train is being delivered as a service, and Hitachi is taking on the billions of dollars of up front risk in rolling stock and train control system; building a factory and maintenance depots. This is a 27-year project and Hitachi is confident that they can return a profit to their investors through the use of big data, IoT, and AI.


The challenge of converting a running system from old diesel trains to electric trains without disruption was answered in the same way that we do in IT conversions; through the use of virtualization. Hitachi built a virtual train, an electric train that has a diesel engine in the undercarriage that can run diesel on the older tracks and electric on the newer tracks. When all the tracks are converted the diesel engines can be removed so that the lighter electric train would have less wear on the tracks.


The trains will be built in County Durham at a new factory which marks the return of train manufacturing to the north-east UK, supporting thousands of jobs and developing a strong engineering skills base in the region. The plant will employ 900 by this spring with more than double that as new maintenance facilities are opened for this fleet. AI will play an important part in the efficiency of maintenance workers and rolling stock utilization. KPIs and sensors will be incorporated in measures to improve and enhance work efficiency based on the maintenance workers daily activities and level of well-being (happiness). While many believe that AI will eliminate jobs, Hitachi AI is being applied to create jobs and enhance the work experience. By improving the rolling stock utilization rate, anticipating the relationship between time related deterioration, operating conditions of rolling stock and worker well-being, Hitachi AI can be applied to detect the warning signs of system failure.


Hitachi AI is also used in the analysis of energy saving performance in traction power consumption, the energy consumed by the traction power supply system which is influenced by parameters such as carriage mass and speed and track infrastructure data such as track gradient and curve information. By managing acceleration and deceleration, AI showed that it can reduce power consumption and carbon emissions by 14% while maintaining the same carriage speed.


In addition to passenger amenities like Wi-Fi and power outlets, AI provides improved passengers comfort through reducing noise and vibration and increased happiness with a workplace environment that manages air conditioning and airflow sensing when doors are opened and closed. Residents along the rail corridors enjoy less noise and cleaner air. Commuters to London can enjoy a reduced commute time, greater productivity during this time and the ability to locate their families outside the congestion of the city.


On June 30 2016, the Great Western Railway unveiled its first Intercity Express Programme (IEP) Class 800 train carrying invited passengers from Reading to London Paddington Station. This commemorated 175 years since the opening of the Great Western Main line. This service is scheduled to go on line in the summer of 2017 with a fleet of 57 trains, and will run on London and Reading, Oxford, Swindon, Bath, Bristol and South Wales as well as north and south Cotswold lines.



For more information about the Hitachi AI Technology/H that is being applied in this train service, please link to this Hitachi Research paper. Hitachi AI Technology/H is a core part of our Lumada IoT platform to deliver social innovation.


Hitachi's strategy is focused on a double bottom line. One bottom line for our business and investors and another bottom line for social innovation.

Stephen Hawking – will AI kill or save humankind” was the title of a blog post by Rory Cellan-Jones on BBC News. This was a report on a speech by Stephen Hawking at an event in Cambridge in October of last year at the opening of the Centre for the Future of Intelligence. Stephen Hawking summarized his speech by saying “"In short, the rise of powerful AI will be either the best, or the worst thing, ever to happen to humanity. We do not yet know which."


BBC News reported in January of this year that a Japanese Insurance firm will be laying off 34 staff and replacing them with an AI system that can calculate payouts. The firm expects to save 140m yen ($1.2m) a year in salaries after the 200m yen AI system is installed. The annual maintenance is expected to be 15m yen. The AI system is the IBM Watson, which is described as cognitive technology that will gather information needed for the policy holder’s payout by reading medical certificates and data on surgeries or hospital stays. By eliminating people with AI, the efficiency of calculating payout should increase by 30%. Saving salaries and being more efficient may be good for the insurance company, but I’m sure it does not feel good to the 34 staff who were replaced.


Hitachi AI technology/H takes a different approach to the use of AI and staff. Dr. Dinesh Chandrasekar, Director, Global Solutions and Innovation(GSI) Group at Hitachi Consulting, published an article on linkedin which showed how Hitachi AI Technology/H was used in an existing warehouse logistics solution to provide appropriate work orders based on an understanding of demand fluctuation and on-site kaizen activity derived from big data accumulated daily in corporate business systems, and its verification in logistics tasks by improving efficiency by 8%.


The key to the difference in Hitachi’s AI approach is the incorporation of Kaizen, which humanizes the workplace and solicits the participation of the workers in increasing productivity.


Dr. Chandrasekar, explains that conventional systems, operate on preprogrammed instructions and do not reflect on-site Kaizen activities or employee ingenuity. In order to reflect this input, a systems engineer may need to redesign the system, which could be disruptive and expensive. It may even require the rewrite of the work process and design which would make it difficult to respond to demand fluctuation and corresponding on-site changes. By taking a deep learning approach to AI and incorporating it into business systems, it will be possible to incorporate Kaizen activities or employee ideas while flexibly responding to changes in work conditions or demand fluctuations to realize efficient operations.


AI and Kaizen.jpg

Other examples of the use of Hitachi AI Technology/H are shown in the following 5 min video. The above warehouse example about improving worker efficiency by 8% is shown. Another example shows the use of Hitachi AI Technology/H to enhance the buyer experience by locating staff in the optimum location in a retail store to increase sales by 15%. The operation costs of a desalination plant was decreased by 3.6%. and the power consumption costs of a high speed train was also decreased.



Instead of replacing staff, Hitachi AI Technology/H can improve the efficiency and productivity of human staff by incorporating Kaizen, the humanizing process of improvement. In this way AI can be lead to Social Innovation and save mankind.

Infrmation governance.png


Data is the foundation for digital transformation and the strategy for digital transformation requires data management, data governance, data mobility, and data analytics. While IT has a pretty good understanding of what is required by management, mobility and analytics, governance is not as well understood. IT people shudder when compliance managers or (heaven forbid!) lawyers join the planning meeting for a transformation project, because they believe that they will be bogged down with requirements which will delay their implementation.


This should not be the case. Data governance must be an equal stakeholder with data management, data mobility, and data analytics. They must work together to improve information exchange. Together they need to ask the tough questions first; who owns the data, where is the data, what data is it related to, who can use the data, how do you monitor the usage, retention, protection, purging, etc. They must determine the roles and responsibilities regarding the creation and management of data within the organization. They need to have a clear understanding of their requirements at a regulatory, business, and corporate asset level in order to apply best practices.


Eliminating silo’s of data and copies of data not only reduces governance risk, but also reduces management costs. The Hitachi Content Platform, HCP provides extensive meta data capabilities, that enables the elimination of data silos and the need to copy data for distribution to mobile platforms. HCP has the unique ability for annotations to be added after the initial saving of the file.  This provides the ability to run analytics to extract content from the files and then add this into the meta data. The meta data can include flags for confidentiality for personally identifiable information, triggering data protection regulations. Audio files can be tagged with transcripts of conversations, making them searchable by keywords. What is good for governance is also good for management, mobility, and analytics. See my blog post on how our HDS Chief Compliance officer uses HCP


HCP makes it possible to integrate data governance with data management, mobility, and analytics and stay ahead of the compliance and governance curve with digital transformation.


Recently, Hitachi Data Systems won Best Information Governance Company 2016 at The Information Governance Conference 2016, reaffirming Hitachi Content Platform’s ability to solve modern data challenges. HCP has deployed 500 petabytes of capacity, generated $1 billion in revenue and is used by four out of five of the largest banks in the world. Additionally, there has been an average of 1000% year-on-year growth of HCP software since 2010.

AI stands for Artificial Intelligence, computer systems that can achieve intelligent activities like those of human brains such as learning, reasoning, and judgment. The learning method employed by early AI systems was to have computers learn the rules and logically find solutions based on the rules. As such, the systems were limited in that they only found solutions within the scope of what they had learned.

hitachi AI.jpg

More recent types of AI being used today are designed based on hypotheses of data that people would enter, implemented as programs, and analyzed and developed for a specific application. One disadvantage of being based on hypotheses envisioned by people is that they usually do not produce results that surpass human ideas and are not general purpose.


One well known AI system is IBM’s Watson, which is described by Wikipedia as a question-answering computer system capable of answering questions posed in natural language. Watson was named after the first CEO of IBM, Thomas J. Watson. In 2011, Watson competed on the quiz program ”Jeopardy” and defeated two former Jeopardy winners. The IBM team provided Watson with millions of documents, including dictionaries, encyclopedias, and other reference material that it could use to build its knowledge. All the content was stored in RAM and Watson could process 500 gigabytes, the equivalent of a million books, per second. Watson's main innovation was its ability to quickly execute hundreds of proven language analysis algorithms simultaneously to find the correct answer.


Hitachi has developed similar technology that can analyze large volumes of text data that are subject to debate and present reasons and grounds for either affirmative or negative opinions on these issues in English. By using multiple viewpoints, it is able to present reasons toward a single perspective. One use case was in medical diagnosis. Recently this was expanded to the Japanese language.


Hitachi uses natural language processing in the Hitachi Visualization Predictive Crime Analytics (PCA), to ingest streams of sensor and Internet data from a wide variety of sources like weather, social media, proximity to schools, subway stations, gunshot sensors, 911 calls, etc and crunches all this information in order to find patterns that humans would miss. Police investigators build crime prediction models based on their experience with certain variables like slang words that may come up on Twitter and assign a weight to each variable. Hitachi’s PCA doesn’t require humans to figure out what variables to use and how to weight them. You just feed it those data sets, and PCA uses machine learning that decides over a couple of weeks if there is a correlation that could predict a crime.


A new learning method is called "deep learning," which incorporates the mechanisms of neural circuits in the brain. Like neural circuits, the computers themselves learn the characteristics of the data entered into them for learning, they can make judgments even about patterns that they have not yet learned. Using this method, it is possible to automatically produce explanations for images and motion pictures, conduct highly-accurate automatic translation, and make forecasts in a variety of fields including financial markets, weather, and professional sports.


Hitachi AI Technology/H (here after “H”) is the name for Hitachi’s Artificial Intelligence engine that uses deep learning and is one of the key technologies in the Hitachi Lumada IoT platform. H was announced by Hitachi in 2015 to respond to a wide range of applications. H can learn from voluminous amounts of data and make judgments on its own, eliminating the need for people to set up hypotheses in advance and finding solutions that humans had not conceived.


In the attached video a robot outfitted with H is placed on a swing made of toy blocks. The purpose of the experiment is to maximize the swing amplitude without providing prior knowledge on how to do so.  The robot can bend and extend its knees but has no knowledge of how to swing. At first the movements are random and the swing barely moves. The robot starts to move the swing in less than 1 minute and in 5 minutes has come up with a swinging motion that exceeds human conception. This is truly an example of deep learning where human input is not required.



H is designed for general purpose AI. H connects to existing systems to learn from different kinds of data and grow according to the situation. Just as in the swing example, H searches for factors and conditions relevant to business objectives from among the various type of data that is already there then searches for a method to optimize these objectives


Hitachi has been involved in R&D and Proof of Concept (PoC) activities targeting "Hitachi AI Technology/H" for about 10 years, and has established an extensive track record of business reforms in a variety of fields, including finance, transportation, distribution, logistics, plants, manufacturing, and healthcare. It has been applied in 57 projects in 14 areas.


Hitachi AI H use cases.png


The key to H is its general purpose nature. According to Hitachi Corporate Chief Scientist, Kazuo Yano, “H does not require customization nor tuning for the warehouse or the store. Under changing business situations, the system learns from the situation, and the changes.”


H will be a powerful component of our core IoT Lumada platform.


If you are going to commit your unstructured data to long term storage and want to ensure that it will be available as your data scales to petabytes across generations of technology, the choice of storage will need to be an object storage system that can scale beyond the limitations of hierarchical file systems, does not require backup, and provides rich metadata features that enable search, analytics, and transparency for compliance. Since your object storage is a long term commitment the vendor you choose and the architecture that you implement must be there for the long haul. Availability must be a key requirement.


A Safe Approach to Object Storage -HCP

Many object storage vendors support the S3 API and store objects into the Amazon cloud. Unfortunately, last week, the Amazon S3 service was down on Feb. 28 from about 9:37 am PST until 1:54 pm PST.  This disrupted a major part of the internet including IoT devices like connected light bulbs and Thermostats. If your Objects storage depended on Amazon S3 cloud you were affected. Cloud services from major vendors like Amazon and Azure generally provide better availability than what most organization can provide with their own resources. However, the risk of a major outage is still there. Therefore, organizations need to provide for recovery options like another copy of the data on private cloud. Don’t trust everything to third party vendors. Bad things can happen whether you use public or private clouds. You still own the responsibility no matter where your data resides. HCP enables copies of data to be stored in geographically separate locations or in private and public clouds.


A SAIN Approach to Object Storage - HCP

I was reading a blog post from another Object storage vendor who was making a point about how changes in traditional object storage topology triggers a rebalance mechanism to reflect the new address scheme and satisfy the protection policy. With traditional object storage, storage is included in the server, so when you need to increase storage or increase compute, you need to add both whether or not you need the other. Once a node is added to the system, data may need to be redistributed across the nodes for redundancy and performance. The problem is how to reduce the impact of this rebalancing and make it transparent to the SLA of the platform. The choices seem to be, no rebalance or rebalance immediately and suffer the hit to the SLA.


With HCP we have several options. The first option is SAIN, which stands for SAN Attached Independent Nodes, where, HCP G or HCP VM Access Nodes are attached via a SAN to an external storage system like the Hitachi VSP. SAIN allows the storage and compute resources to be scaled separately and data protection is provided by the VSP RAID controller.


Scale transparently from small to Large

In addition to adding SAIN storage, adding additional compute and storage can be done in multiple layers with little or no impact on SLA. HCP Access Nodes can be configured over an IP network as a Redundant Array of Independent nodes (RAIN) with no SAN attached storage. This is similar to traditional object storage where storage is directly attached to the servers and data is shared across an internal IP network. Data protection is provided by copying data across the nodes. HCP Access Nodes also supports S3 access to HCP S storage nodes and cloud storage over a storage IP network. Our current HCP Access nodes come as an HCP G appliance or as an HCP VM, which is HCP Access Node software that runs on a VM)


An initial configuration can start with 3 Access Nodes using RAIN.

Small HCP.png

HCP can scale to a very large configuration including RAIN, SAIN, HCP S, and public cloud.

Large HCP.png

The S10 and S30 are HCP S storage nodes with low cost erasure coded JBOD disks that are comparable to the cost of public cloud storage but sits in a private cloud and is accessed by the HCP Access nodes over a storage IP network. (S10 supports up to 192 TB in 4RU, the S30 supports up to 9.4 PB in 68RU, each with 10TB JBOD disks) Public cloud is also accessed over this storage network.



(1) - Adding HCP G or HCP VM Access Nodes. The newly added servers are immediately available and usable to improve performance, and the user has complete control over the rebalance. A user can configure the intensity and schedule of the rebalancing service. For example, they can schedule the rebalance to run automatically at high-intensity only during off-hours (to avoid impacting on-hours SLAs). This control gives you the best of both worlds – immediate performance increase as well as a tunable rebalance.


(2) – Adding storage capacity to an HCP G or HCP VM Access node is seamless. Since an HCP system provides a unified namespace, capacity in the background is added in the form of LUNs or Volumes, but this is invisible to the end-use or application. This is the same whether it is internal or SAN attached capacity. The balancing of data between old and new storage volumes is also tunable, the same as (1) above.


(3) – HCP S can be transparently added to an HCP system, effectively increasing the total storage capacity of the system without any disruption. S nodes added to an HCP system are immediately available for use and there is no rebalance of data between S nodes necessary. Therefore, adding S nodes to an HCP storage pool simply increases the possible bandwidth/performance for incoming data.


(4) – HCP S nodes can have their capacity expanded. An S node distributes data across all available JBOD disks using erasure-coding. When disks are added or removed from an S node, internal services automatically optimize data placement – transparently – to allow for the greatest availability, protection and performance.


(5) – HCP capacity can be expanded over S3 to public cloud. Cloud capacity can be added transparently through S3 connection to public cloud.


HCP can support your unstructured data need in public or private clouds with the assurance of safety, availability, and scalability. Using HCP’s adaptive cloud tiering (ACT) functionality, you can manage a single storage pool using any combination of server disks, Ethernet attached HCP S nodes, SAN disks, NFS or a choice of one or more public cloud services, including Amazon S3, Google Cloud Storage, Microsoft Azure®, Verizon Cloud, Hitachi Cloud Service for Content Archiving, or any other S3 enabled cloud service


For more detailed information, please see the HCP architecture white paper at this link

I included this phrase in my blog post  Email Investigations Made Easier with HCP/HCI to explain that while the data scientist will interpret and represent data mathematically and the storage specialist will focus on storing, protecting and retrieving data; the lawyer is only interested in finding evidence to prove or disprove that something has occurred. HCP and HCI will do the work of the data scientist and the storage specialist so that the lawyer can concentrate on what he does best.


I received the following comment from Tony Cerqueira in response to my post:


“I heard the same joke, but in the final line, the lawyer turns his job over to an AI algorithm, and is then consumed by wild dogs.”


Tony makes a great point about AI.


AI and robotics is going to put a lot of people out of work, which is a great concern to economists and politicians.




Last October, Otto’s self-driving 18 wheeler truck cruised down Colorado I25 for 120 miles to deliver 50,000 Budweiser beers from AB InBev NV’s distribution center in Fort Collins to Colorado Springs. This was the first commercial use of self-driving trucks. AB InBev said it could save $50 million a year in the U.S. if the beverage giant could deploy autonomous trucks across its distribution network. Otto is an American self-driving technology company established in January 2016 and acquired in August 2016 by Uber. It retrofits semi-trucks with radars, cameras and laser sensors to make them capable of driving themselves.


Self-driving trucks will supersede the need for self-driving cars since the trucking industry desperately needs them. The trucking industry hauls 70 percent of the nation’s freight and simply doesn’t have enough drivers. The biggest problem trucking companies have is the hiring, training, and retention of truck drivers. The American Trucking Association pegs the shortfall at 48,000 drivers, and says it could hit 175,000 by 2024. According to the ATA there are 3.5 million professional truck drivers in the United States. What would happen if they were all replaced by self-driving trucks?


Trucking will become safer and more efficient and distributors will be able to save tons of money. Distribution costs are roughly 10% of product costs according to Price Waterhouse Coopers. On the downside there will be millions of people out of work.


Last month when I talked to a Finnish customer I learned that Finland has become the first country in Europe to pay its unemployed citizens an unconditional monthly sum. Under the two-year, nationwide pilot scheme, which began on 1 January, 2,000 unemployed Finns aged 25 to 58 will receive a guaranteed sum of €560 (£475). The income will replace their existing social benefits and will be paid even if they find work. The details were published in the Guardian. This is the concept of a universal basic income which is gaining traction as automation threatens jobs and traditional welfare systems become complex and unwieldy. In Europe similar experiments are being proposed in the Netherlands, France, Italy, and Scotland. In Canada, Ontario is set to launch a similar project this Spring.


“Credible estimates suggest it will be technically possible to automate between a quarter and a third of all current jobs in the western world within 20 years,” Robert Skidelsky, professor of political economy at the University of Warwick, said in a paper last year. He said a universal basic income that grew in line with productivity “would ensure the benefits of automation were shared by the many, not just the few.”


Digital transformation and IoT promises to increase productivity and increase efficiency. Will this require a universal basic income for the workers that will be displaced?


A universal basic income will be preferable to being consumed by wild dogs. At any rate I don’t think lawyers will be displaced since automation like self-driving trucks will create a lot of litigation.


I began my career in IT during the mainframe days. Now 40 years later, the mainframe and I are still around. Next week SHARE, the volunteer-run user group for IBM mainframe computers, will be holding its annual conference at the San Jose Convention Center, March 5-10. SHARE was founded in 1955 and is a forum for exchanging technical information about IBM mainframe systems. A major resource of SHARE from the beginning was the SHARE library. Originally, IBM distributed its operating systems in source form and systems programmers commonly made small local additions or modifications and exchanged them with other users. The SHARE library and the process of distributed development it fostered was one of the origins of open source software. If you ask the SHARE organization what the capital letters stand for they will tell you: “SHARE is not an acronym; it’s what we do.”


Hitachi Data Systems will be a Gold sponsor for this year’s SHARE conference and will be exhibiting in booth #308. Hitachi is featuring enterprise class, high-speed flash storage available in a variety of configurations to support mainframe performance and availability requirements. Mainframes still drive many high performance, core applications for Financial, telco, and other critical infrastructure verticals. For instance, at the 50th anniversary celebration of the IBM mainframe in 2014, Citibank CEO Walter Winston estimated that his company’s IBM z Systems could process about 150,000 transactions per second or nearly nine million transactions per minute. That type of performance needs a high performance, highly reliable and resilient storage system.


In order to support mainframes, the storage system must be able to support FICON channels, and all the storage management and business continuity features of modern mainframe systems. This eliminates all the new flash array vendors, except for our VSP F1500. Our new VSP G1500 hybrid array and all-flash VSP F1500 deliver increased mainframe performance with the industry’s only 100% data availability guarantee and powerful replication technologies to eliminate downtime. The following link, Mainframe Storage Compatibility and Innovation With

Hitachi Virtual Storage Platform G1x00 and F1500 , shows how we are not only compatible with IBM mainframe storage features, but also how we innovate through the development and testing of unique Hitachi storage and storage management features to provide mainframe customers with additional capability and value.


For instance, while Hitachi offers SSDs in these products, Hitachi offers its own flash modules (FMDs) that include onboard ASICs and data management intelligence to provide superior performance over SSDs. FMDs provide tighter management and wear information with the storage subsystem enabling us to extend flash life and provide predictable failure management. With scalability of up to 8PB of flash storage, 14TB FMDs, 2TB of cache, HDS gives you superior scaling of performance to meet changing data center needs. Our fault tolerant system architecture combined with powerful synchronous and asynchronous replication technologies enable multi-site business continuity to meet zero recovery point and zero recovery time objectives. Capacity is optimized with dynamic tiering, compression, and deduplication and security is provided with encryption of data at rest and Key Management Interoperability Protocol (KMIP). Just as important, the Hitachi FMD enables shredding of flash cells, even those that are not seen by the controller, and provides a report of the status of shredded cell for auditing purposes.


For more information, visit our booth and meet our mainframe experts who are well known in the mainframe community, Ros Schulman - HDS Director, Mainframe, Product Manager, Bill Smith - HDS Americas Product Manager, Ron Hawkins - Senior Manager, SETO Complex Test Lab – Mainframe. Also take the opportunity to attend our speaking engagements:


Integrated Solutions from Hitachi, Telus, & Brocade (Vendor-Sponsored Presentation)

     Tuesday, March 7, 2017: 1:45 PM

     William Smith (HDS), Ros Schulman (HDS), Dick Masyga (Brocade), Fawad Shaikh (Telus)


The New HDS G1500 and F1500 Storage Platforms

     Wednesday, March 8, 2017: 10:00 AM to 11:00 AM

     William Smith (HDS) and Ros Schulman (HDS)


Batch – Backups, Dependencies, and Restore Processes - The Missing Links

     Wednesday, March 8, 2017: 4:30 PM - 5:30 PM

     Ros Schulman (HDS) and Rebecca Levesque (21st Century Software)


Hitachi Sponsored Lunch & Learn

     Thursday, March 9, 2017: 12:15 PM

     Ron Hawkins (HDS)



Emails are the primary communications tool for enterprise business. We exchange documents and make agreements through email. Emails contain private information of people and companies.  Almost everything we do is documented in email and leaves an electronic trail. Therefore, e-mail has become a dominant area of focus for all corporate litigation. Email investigations for compliance provides the evidence to prove or disprove a violation of regulations, whether it is an accusation of fraud or misuse of personal or corporate information.  Not only can it prove what occurred but identify who was involved. Ongoing audits and monitoring of emails can ensure compliance and protection of information.


Due to the high volume of email, investigations could be cost-prohibitive and time consuming. Businesses must bear the cost of the discovery process and must make all reasonable effort to retrieve relevant documents.  One firm had to recover messages over a 3 ½ year time frame from tapes across 40 e-mail servers at a substantial cost of time and effort. The cost of eDiscovery can run into the tens of thousands of dollars especially if you have to load and scan generations of backup tapes or years of archive storage. The high profile investigation of Hilary Clinton’s private email server over a 7 year period is estimated to have cost $20 million according to the American Thinker.


A corporate policy that email is deleted after “x” number of days does not protect the company from having to run through a costly discovery process.  Even though they can claim that they don’t have the e-mail since it was deleted from the server does not mean that an end user does not have a copy of it in a local PST file on their laptop.  The current view for meeting regulatory compliance is to retain all e-mail in an indexed format for ease of search and retrieval.  Fines are also increasing for non-compliance with defined regulations. A recent example of increasing fines is the General Data Protection Regulation which will require fines as much as $20 million Euros or 4% of an organizations worldwide turnover, whichever is higher.


According to David Karas, our Chief Ethics and Compliance Officer for HDS, our cost for email investigation is much lower due to the use of our Hitachi Content Platform (HCP) and our Hitachi Data Discovery Suite (HDDS). HDS does not use HCP to archive email. HCP is used to ingest, tag and store the email journal. Exchange has a journaling feature where it forwards a copy of each and every email to the journal (HCP via SMTP) where it can be stored forever.  That’s different than an archive, which is more about saving space by keeping newer emails on more active storage and stubbing or tiering older emails to less costly storage.  These emails can still be deleted.  A journal is really meant to keep all email and do so as a separate copy so you have a complete record. This eliminates the need to scan generations of backup tapes from multiple servers or scan PST files. Since we use HCP Anywhere, we also include emails and attachments on mobile devices.


Using HDDS, David can structure a search and have it execute in less than a second then refine it and run multiple searches in a matter of minutes. Often he can decide after a few runs whether the claim he is investigating has any merit, if not he can close the claim and save the cost of further investigation. David is not a data scientist, he is a lawyer with deep experience and expertise in ethics and compliance behaviors. He can open up his computer anywhere in the world and do his own investigation. He does not need to send a request to IT to search for emails between Hu Yoshida and Britney Spears, which could take several days and run the risk of information leakage which might be embarrassing to Hu and Britney. (this is a hypothetical example)


HDDS was our first generation search tool, which is being replaced by our Hitachi Content Intelligence product.  HCI will add additional intelligence to our ability to investigate HCP repositories like ones where the Exchange journal is ingested. At HDS we are currently in the process of converting to Office 365 and replacing HDDS with HCI.


Office 365 is cloud based but it still provides a journal which can be ingested by HCP where we can search it just as we have in the past, but with a newer search tool HCI. In fact, this is preferred by Office 365. The journal could be sent back into Office 365 to an internal journaling mailbox, but this will chew up a lot of cloud storage capacity which will increase the cost of Office 365. The Office 365 archive, is used for reducing cloud storage capacity but searches take 45 min or more since the archive is not optimized for search as it is in HCP. Search in Office 365 is optimized to the lowest common denominator and therefore takes much longer.


A data scientist, a storage specialist, and a lawyer walk into a bar….


You’ve heard this one before. This is the setup for a joke where three individuals view a situation from different perspectives. The punch line comes from the view of the last individual. The data scientist will interpret and represent data mathematically; the storage specialist will focus on storing, protecting and retrieving data; and the lawyer is looking for evidence to prove or disprove that something has occurred. HCP and HCI will do the work of the data scientist and the storage specialist so that the lawyer can concentrate on what he does best.


Check with your compliance or eDiscovery officer and see what it costs for them to do an investigation of email or other data repositories. Then talk to your HDS representative about the capabilities of HCP and HCI to discover how we can facilitate the process of eDiscovery. Not all of us are lawyers who are experienced in compliance. However,  David said that he could be available to talk to your compliance officer to share best practices and experiences.