Technical Memes in 2014
It is getting to be that time of the year again. Time to make predictions on various risk levels, about what is going to happen next year. For 2014, my colleague Hu and I are going to split up the task of creating a list of technical trends and themes. Our message will span across a couple of cross-linked blog posts with Hu aiming at things that we should see come to pass soon and I’ll focus on memes that are likely to take hold.
With that said, let’s dig in!
1. EMERGING EXA-SCALE ERA
I’ve talked about this before in my Financial Forum series. In this post I used the Square Kilometer Array to motivate what kinds of sweeping changes are going to be required to achieve an exa-scale architecture. Since then, there have been bets against the emergence of such an architecture emerging by 2020 and several active groups and organizationsdeliberately planning for such a platform. I’m pretty sure that in 2014 we will see heightened debates on the possibility of such a platform arising on, before, or after 2020. So my tangible prediction is that the key word “exa-scale” will become hotly debated in 2014.
2. THE BI-MODAL ARCHITECTURE
Let’s face it, as fast as our LANs, MANs and WANs are they are still roughly an order of magnitude slower than the internal fabrics and busses of the storage and compute platforms; Compute and storage fabrics/busses are measured in gigabytes per second and networks are measured in gigabits per second. What to do? What we’re starting to see emerge is richer storage control and low latency access within the server. Today this is acting as a cache, but tomorrow who knows… I referenced this in the Bi-modal IT Architectures discussion on the Global Account and Global Systems Integrator Customer Community. For completeness, I’ve pulled the diagram in from that discussion to illustrate that a key driving force in the change is the shift in software architectures. The diagram suggests a kind of symbiotic relationship between an evolving software stack and the hardware stack. My expectation for 2014 is that we will see one or more systems that implement this kind of architecture, though the name may be different (I endeavor to cite references to them throughout the year).
3. PROCESS, ANALYZE AND DECIDE AT THE SOURCE
The Hadoop crowd has half the story right. The other half is that to support the Internet of Things where “the data center is everywhere” (thank you Sara Gardner for this quote) and low bandwidth unreliable WAN pipes are the norm, moving the data from edge or device to core isn’t really feasible. Further, today many EDW and Hadoop platforms in practice move data significantly over the network. For example, I’ve talked to several customers who pull data out of an RDBMS, process it on Hadoop, and push back to another RDBMS to connect to BI tools. This seems to violate one of the basic tenants of the Hadoop movement: bring the application to the data. Therefore it is necessary to augment data sources with intelligent software platforms that are capable of making decisions in real time, analyzing with low latencies, and winnow/cleanse data streams that are potentially moved back to the core. Note that in some cases the movement back to the core is by acquiring a “black box” and literally shipping data on removable media to a central depot for uploading. This suggests that perhaps that a sparse, curated information model may be more relevant for general analysis/processing than raw data. I digress. For 2014, I predict we will begin to see platforms emerge that start to solve this problem and an increased level of discussion/discourse in the tech markets. We have been calling this “smart ingestion” because it assumes that instead of dumb raw data there is some massaging of the data where the user gains benefit from both the “massage” and the outcome.
4. THE PROGRAMMABLE INFRASTRUCTURE
Wait. Did we just cross some Star trek like time barrier and go back to the era of the systems programmer? Is the DevOps practitioner really a re-visitation of a past era where the mainframe ruled the world? Likely not, but in the spirit of everything old being new again perhaps there is a sliver of truth here. To me, a key center point for the Software Defined Data Center (SDDC) is programmatic control of at least compute, network and storage. In effect, what application developers are really asking for is the ability to allocate, directly from their applications, these elements and more to meet their upstream customer requirements. Today the leading movement in the area of the Software Defined Data Center is the OpenStack initiative and community that surrounds it. We’re definitely far from complete control of the IT infrastructure from application developers, but I think that we are surely on that trajectory. A key aspect behind programmatic control is a reduction of the complexities and choices that application developers can select from and a fundamental reality that almost everything will be containerized in virtual infrastructure of some kind. By giving these things up, DevOps proficient developers will be able to quickly commission, decommission and control necessary ICT elements. In fact, I know of at least one customer that has had an application development team realize exactly this fact. What has happened is that the application team was being very prescriptive to the IT organization while at the same time authoring much of their next generation application stack on a public cloud. At some point, two things occurred: the cloud service could not meet their requirements and engineering realized they traded complete flexibility for speed to market and they liked it. The result was that the IT organization used OpenStack to build a private cloud so they could host engineering’s new application. This is a great “happily ever after” moment and I think hints at things to come. My prediction here is that we will begin to see OpenStack-friendly private cloud infrastructures for sale within the coming year. Since this is the most direct prediction I’m keeping my fingers crossed.
5. MEDIA WITH GUARANTEED RELIABILITY
As we’ve talked to customers, contemplating exa-scale systems, we’ve found they are reconsidering everything in the stack including the media. For a subset of these users tape and disk won’t cut it and they are in fact looking towards optical media of all things. Their constraints and thinking around power consumption, floor space, and of course extreme media durability coupled to specific requirements to guarantee data preservation, in some cases for 50 years or more without loss, this means that existing approaches won’t do. As it turns out there could be a perfect storm for optical notably with the maturation of standards, media density roadmaps, customer need, and emerging capacity in the supply chain, I argue that for specific markets optical is poised to make a comeback. Therefore both HDS and Hitachi are opening the dialogue through activities like the 2013 IEEE Mass Storage Conference or Ken Wood‘s post on US Library of Congress Update: Designing Storage Architectures for Digital Collections. We aren’t the only ones paying attention. Companies like M-Disc, for example, are pushing forward a thought pattern of really long term media. They articulate this argument well on their website:
“M-Discs utilize a proprietary rock-like inorganic material that engraves data instead of using reflectivity on organic materials that will eventually break apart and decompose with time. Furthermore, did you know that M-Disc technology is already being adopted worldwide by all major drive manufacturers, and that the M-Disc Blu-ray is read/write compatible with all current Blu-ray RW drives? While it is important to note that gold has a lower reflectivity than silver, even silver discs are still made of organic materials that may begin to lose data after only 2 years. See: Archival DVD Comparison Summary.”
As to a prediction… I think in this case we’ll begin to see the re-invention of the optical industry starting in 2014 with a focus moving from the consumer towards the enterprise. It wouldn’t surprise me if you even see the careful introduction of an offering or two.
1.Blu-ray icon/image2.Ethernet cable image
- Author: Jeremiah Ro (Flickr: Jeremiah Ro’s Photostream)
- License: Creative Commons — Attribution-ShareAlike 2.0 Generic — CC BY-SA 2.0
- URL: http://farm4.staticflickr.com/3182/2641452698_d7d6eee15e.jpg
- Hadoop latte image
- Author: Yukop (Flickr: yukop’s Photostream)
- License: Creative Commons — Attribution-ShareAlike 2.0 Generic — CC BY-SA 2.0
- Author: Rogerio de Souza Santos
- License: The GNU General Public License v3.0 – GNU Project – Free Software Foundation (FSF)
Hitachi Proud To Be Elected a Gold Member of the OpenStack Community
We are pleased to share the news today that Hitachi has been elected to Gold Membership of the OpenStack Foundation at the Hong Kong summit. By becoming a gold member, Hitachi joins other industry vendors to help “achieve the OpenStack Foundation Mission by Protecting, Empowering and Promoting OpenStack software and the community around it, including users, developers, and the entire ecosystem.” Our election to the OpenStack community as a gold member is an extension of our philosophy of helping IT develop a truly Software Defined Data Center that is open and flexible.
Our Gold membership is a part of our accelerated participation in the OpenStack community. In June, Hitachi joined the Red Hat OpenStack Cloud Infrastructure Partner Network, and in August we announced our formal participation as a corporate sponsor. Our subsequent code contribution is now included in the latest Havana release.
Hitachi has a long and productive history of supporting the open source community. Hitachi created an industry-leading operating environment for Linux applications, and as a Linux Foundation member (formerly Open Source Development Laboratory), we provide the development environment for open source projects focused on making Linux enterprise ready as well as participate in performance and reliability assessments and tool development for Linux. Our contributions to open source software include Linux kernel tracing infrastructure (SystemTap), Disk Allocation Viewer for Linux (DAVL), Branch Tracer for Linux (btrax), and others. In addition, Hitachi has done extensive work with Red Hat to provide hardware level virtualization technology on LPAR’s for KVM stacks.
Hitachi is now building on its commitment to open source through its involvement with OpenStack. Hitachi provides open cloud architectures to give IT more control over security and governance, the ability to leverage existing IT assets and to provide interoperability with a variety of infrastructure components, applications, and devices to create “best of breed” cloud while avoiding vendor lock-in. In addition to OpenStack, Hitachi also provides private cloud delivery solutions based on VMware and Microsoft.
Hitachi private clouds are built on a secure, fully consolidated and virtualized foundation – a single platform for all workloads, data and content. IT departments choose private cloud infrastructures to control and protect their data assets, but in many cases they have an eye toward hybrid clouds so they can take advantage of the flexibility and elasticity of public clouds, depending on the workloads and application requirements. For IT, openness is in part represented by their ability to leverage a variety of cloud delivery models.
This openness is also reflected in the degree to which customers can reach out beyond their data centers to leverage portals, management and open source frameworks, infrastructure components (provided by Hitachi as well as third parties), and other cloud environments to safely and securely create best-of-breed functionality for their stakeholders. We accomplish this by providing APIs, widely used and standards-based interfaces (such as Amazon S3 and REST), as well as other access methods and protocols.
Our support and involvement with OpenStack is another example of the company’s goal to provide IT with private cloud solutions that are more virtualized, more secure and more open. Our reputation as a vendor of robust cloud-enabled platforms and our large global install base will help further establish OpenStack as a viable private cloud framework.
My Goodness!! Won’t that Mainframe Thing Just Go Away!!
It’s been almost 30 years since its demise was predicted and the words “the mainframe is dead” echoed throughout datacenters around the world. About the same time people were scrambling to remove words like “MVS” and “mainframe” from their resumes. Outside of their datacenters and the semi-annual SHARE conference, it was seldom discussed. The “gray-hairs” ultimately retired and the skill set declined, so it would have to die. Right? Well, not quite. This is not a beast to be starved.
Like Arnold Schwarzenegger in “The Terminator”, the mainframe is not about to die. The gray-haired crowd (or no-haired crowd) has been replaced by tattoos, dreadlocks, and body piercings. Attend a SHARE conference and you’ll see what I mean. But aside from the visual change, there is certainly a mindset saying “the mainframe ain’t going anywhere anytime soon”.
This survey has received a fair amount of coverage. Rob Enderle of IT Business Edge gave a great review of the survey (in a level of detail I won’t repeat here, thank you, Rob).
So where does that leave us? The mainframe never did die. It merely receded into the murky background while we all jumped into the more challenging topics relating to Open Systems. And we’re still jumping into these challenges today. And we’ll still be jumping next year. In the meantime, HDS and Hitachi have never forgotten this platform with its blazing performance, top notch security, scalability, ultra reliability, and what many are now concluding is a price competitive OS.
And what exactly has HDS done? For starters, we remain fully IBM compatible. We’ve taken it a step further by adding HDP (Dynamic Provisioning) for the performance boost it gives to critical applications such as DB2, IMS, and CICS VSAM. Since IBM is pushing big data analytics on the mainframe, this is key. You need the IOPS that only HDP can provide. The HDP architecture is unique to HDS.
Additionally, we now have HDT (Dynamic Tiering) on the mainframe. On Open Systems we’ve calculated that over 80% of the data we find on Tier 1 disk doesn’t need to be there, and I believe that statistic applies to the mainframe as well. Less expensive media, much better environmentals, and less floor space. Let the storage system move 80% of your data to tier 2 disk. And to make matters even better, we’ve integrated HDT with SMS at the Storage Group level.
So although the mainframe has been hidden from public view for 3 decades, rest assured that we have never given up on it.
I just thought I’d bring that up.
How to reduce backup storage requirements by more than 90% without data deduplication
One of the most egregious causes of the copy data problem for many organizations is the common practice of performing full backups every weekend. The architecture of the backup application forces this practice, as it requires a recent full backup data set as the basis for efficient recovery. But each full backup set contains mostly the same files that were backed up in the previous full backup, and the one before that.
Below is a simple example to illustrate this, showing the differences between the common “full + differential”, “full + incremental” and “incremental-forever” backup models. First, the basic definitions of these models.
- Full + differential: copies any new or changed data since the last full backup; a periodic full backup helps to keep the size of the differential set from growing out of control.
- Full + incremental: copies the new or changed data since the last full or incremental backup; a periodic full backup helps to keep the number of incremental sets from growing out of control.
- Incremental-forever: starts with a full backup, then copies only the new and changed data during each backup; it never performs another full backup.
The differential backups require more storage and will copy the same files multiple times during the week, but they offer the benefit of faster, more reliable recoveries since you need to restore only the last full backup set and then the last differential set (a 2 step recovery process). However, the size of the differential backup will increase each day, until a new full backup is completed. Doing differentials forever would eventually be the same as doing full backups every day.In comparison, the full + incremental method uses a little less storage, and the daily backups will transfer less data, but recovery can be complicated by needing to restore multiple incremental data sets, in the correct order.The incremental-forever backup solutions on the market are able to track each new file within its repository and present a full data view from any point-in-time within the retention period. This enables a one-step recovery process, which is faster and less error prone than the other models. And of course, this method eliminates the periodic full backups.Better backup, better recoveryFor this example, let’s assume we have a normal business, school or agency that operates 5 days per week, 50 weeks per year. They have 100TB of data, and a total data change rate of 50% per year (50TB). This equates to 1% per week (1TB), and 0.2% per day (200GB). They retain their backups for 12 weeks for operational recovery, assuming that data that needs to be retained longer is archived.The full + differential model copies 200GB on Monday, 400GB on Tuesday, through to 1TB on Friday, and then copies the full 100TB during the weekend. The full + incremental and the incremental-forever models each copy 200GB per weekday, but the full + incremental copies the full 100TB on the weekend while the incremental-forever system takes the weekend off.Including the initial full backup (100TB), the total backup storage capacity needed for 12 weeks for each model is:
- Full + differential: 1,336 TB (1.3 PB)
- Full + incremental: 1,312 TB (1.3PB)
- Incremental-forever: 112TB (0.1PB)
That’s a 91% reduction in capacity requirements without spending any money or compromising system performance on data deduplication. How much does 1.2PB of backup storage cost to acquire, manage and maintain? Actually, it’s 2.4PB of extra storage, since we’ll want to replicate the backup repository for off-site disaster recovery. If the backup data is retained for longer than 3 months, then these savings will be increased even more.
Continuous vs. scheduled incremental-forever
As with all choices in technologies, there are some trade-offs involved when selecting an incremental-forever backup model. The classic, scheduled approach to incremental backup used by most data protection applications requires the scanning of the file system being protected to determine which files have changed since the last backup. If the system contains millions or even billions of files, that scan can take many hours, consuming most of the available backup window. Copying the actual data to be backed up takes relatively little time.
This scanning time can be completely eliminated by using a continuous data protection (CDP) approach, which captures each new file, or block of data, as it is written to the disk. There are only a few solutions on the market, including Hitachi Data Instance Manager, that combine the benefits of incremental-forever and continuous data protection.
The CDP model will require a little more storage than the scheduled incremental model, since it will be capturing multiple versions of some files during the day as they are edited, but that’s a good thing. And the storage required will still be far less than the solutions that require full backups.
To learn more about how HDIM can reduce the costs and complexity of protecting your data, watch this video [link TBD], or to request a demo send a note to email@example.com.
Hitachi Performance Leadership, The Teddy Roosevelt Way
Of the many things US President Theodore Roosevelt is known for, one certainly is the quote “Speak softly and carry a big stick, you will go far” and made me think that Teddy would have made a great HDSer.
You see, while other vendors are loudly proclaiming performance leadership after upgrading systems that were sorely in need of a lift (yes, I’m looking at you EMC), Hitachi continues down our path of consistently providing our customers the most innovative hardware architectures around.
Only without all the yelling.
Why has this come to mind now? We’ve recently completed SPECsfs2008_nfs.v3 performance testing of two new Hitachi Unified Storage VM (HUS VM) configurations, both “all flash” leveraging our patented Hitachi Accelerated Flash storage and both with our newly announced Hitachi NAS technology.
And the results prove we have the speed to lea…. well, let’s just say they are pretty awesome.
Our new configurations were:
|Storage System||File Component||Flash||Posted Results|
|HUS VM||2-Node HNAS 4100 Cluster||32 Hitachi Accelerated Flash drives (1.6TB each)||Click Here.|
|HUS VM||4-Node HNAS 4100 Cluster||64 Hitachi Accelerated Flash drives (1.6TB each)||Click Here.|
The above configurations are nothing too extravagant from a hardware perspective: 2 and 4 node HNAS clusters and 64 or 32 flash modules. That simplicity is exactly why the results are all that much more exciting.
Our HUS VM 2-node HNAS 4100 system delivered 298,648 operations/second with an overall response time of 0.59 milliseconds. While both numbers are astounding for the amount of hardware deployed, note that the 0.59 milliseconds overall response time was the lowest reported on this benchmark. Ever.
Our HUS VM 4-node HNAS system delivered a whopping 607,647 operations/second with an overall response time of 0.89 milliseconds. Again, amazing results. But the throughput numbers start to get so large that it might help to understand them by looking for a relevant comparison.
The most timely, and arguably most relevant, comparison might be the recently and LOUDLY announced VNX8000 (result here)… king of the recent “VNX2” launch. It certainly showed up to this benchmarking match ready to rumble, as it had twice as many “X-blades” installed (eight) to drive NAS traffic as did our 4-Node HNAS configuration and five hundred and forty four (yes, 544) SSD drives compared to our seemingly impossibly outgunned 64 Hitachi Accelerated Flash Modules.
Despite the should-be insurmountable hardware advantage for the brand-spanking-new EMC system, it actually drove 5% LESS NFS operations per second than our significantly more efficient HUS VM configuration, with both systems providing sub-millisecond overall response times.
Granted, in the VNX architecture one X-blade needs to sit idle waiting for an issue to arise before providing value and EMC does not have advanced, enterprise flash capacity like our Accelerated Flash, but you’d expect it to be able to beat out a system with half the installed file nodes and less that 1/8 the amount of flash devices, wouldn’t you?
With that comparison helping set context, the next logical one might be to NAS market “leader” Netapp. The most relevant system that Netapp has published results for is the FAS6240 in a wide variety of cluster sizes. In all fairness, it becomes hard to make a logical comparison between the systems because the HNAS per-node performance is >2X that of a FAS6240 node and NetApp’s flash-strategy seems to lag its benchmarking so only eight FlashCache cards are leveraged in Netapp’s benchmark.
Thus, the closest comparison is probably an 8-node FAS6240 cluster (results here), but despite having twice as many file-serving nodes and 576 power-hungry disk drives it still provides 18% less operations per second and is unable to provide a sub-millisecond overall response time.
Of course, benchmarks are not the real world, though industry-trusted ones like those at SPEC.org do their best to maintain a useful level of openness and vendor comparison for end users. That ability to compare is important, because NAS solutions for such things as large scale VMware deployments and Oracle databases among (other use cases) continue to gain significant traction and demand extreme performance. Customers however do not want to simply throw massive amounts of filers, disk drives and SSDs at every problem, and they are realizing that the system architecture does, in fact, matter.
So we are rightly proud of our architectural advantages that allow customers to deploy more efficient solutions and get more predictable performance. Yes, the market is awash with hardware providers whose design point is much more about developing to the lowest possible cost, while putting the onus on customers to deploy more hardware to make up for architectural and design limitations.
We choose another path. We choose to develop better hardware to provide our customers with highly functional systems that provide predictable (and yes, best-in-class) performance in the most efficient way possible. Some might say that’s a harder path, and maybe they are right. But results like these, and more importantly the continued success of our customers, have us convinced it’s a better path.
To learn more about our architectural differences that enabled this success, here are some links for the technically inclined among you:
- Hu Yoshida on HUS VM architecture
- Hu Yoshida on Hitachi Accelerated Flash architecture
- Matthew O’Keefe on Hitachi NAS architecture: Part 1 and Part 2
So, while other vendors speak loudly, launch loudly and deploy over-sized solutions. We’ll walk softly and provide the best technology we can. I for one, think Teddy would be proud.
Software Defined Software
And NO, I’m not trying to be sarcastic; this is a seriously titled blog. But I must explain. Hang in there for a paragraph or two. It gets better. It is “Software Defined Software”. Trust me.
Firstly, yes, (for those of you who know me) I am a serious cynic. It comes from (many too many) decades of observing hype, bubbles, buzzwords, trends, promises, cure-it-all startups, hype masters, marketing wizards, promises and predictions that the storage industry, as we knew it, would be transformed forever. Tomorrow this widget will be 100 times faster, be 1/100th the cost, and be 100 times more reliable than what you have today. Did I mention that it also cures world hunger?
So here’s the latest: Software Defined Storage. So, what is Software Defined Storage? We have all read about it. EMC has announced it. VMWare has announced it. And even HDS has concluded that it has Software Defined Storage. But in our case, we really do, and this gets to the heart of this explanation. So here’s my diatribe on why Software Defined Storage is really Software Defined Software.
Not that many years ago, I was the guy (or one of the guys) designing and architecting the storage subsystems of the day. We had a rather simple terminology on things. Simply, that “software” was an instruction set and some programs that ran on the server. And what “drove” the hardware (in this case storage) was called microcode (or ucode for you engineering types). I think ucode might be called “firmware” these days. My iPhone has “firmware” to control its hardware, and my 327 apps, run all of the silly apps that I need.
But then, sometime in the mid-1990’s, a company called EMC () decided to start calling their microcode “software”. Why? Because it looked better on the books for the Wall Street guys. Yes, it was that simple. Software suddenly became a whole lot sexier than microcode. I hated the sudden change in terminology, but at the time, EMC was the 900-pound gorilla, and who was I to argue. Although, I did!
Fast forward (I think that’s a VCR term and therefore inappropriate, but visually effective), to today. Software Defined Storage is all the rage. Every vendor is on the bandwagon, including HDS. But there is one, very significant difference, and it cannot be emphasized enough. Software is microcode. Microcode is software. HDS has invested 2 decades into “software” to implement copy on write (COW), synchronous replication, asynchronous replication, copy after write (Thin Image), cloning, 3-Data Center Replication, 4-Data Center Replication, archiving, and more functionality than you could ever imagine.
So, can I at least go back to being the cynic that I am, and interpret “Software Defined Storage” as to what it really is, which is “Software Defined Software? , we must now conclude that software is microcode, and microcode is software. (thank you EMC and your “Wall Street handlers” at the time) As Hitachi/HDS has the best hardware functionality, that it is, in fact, the best microcode in the business, meaning (trash the hype) why are we not the best “Software Defined “Software”?
I not only think we are, I know we are. Define software the way you want, but as this ex-architect guy, I’m convinced I’m in the right place.
Hitachi Data Systems Now a Sponsor of the OpenStack Community
Today, we are proud to announce that Hitachi Data Systems is now an official corporate sponsor of the OpenStack initiative. This is significant for a number of reasons, most importantly because it demonstrates our continued commitment to our customers to provide flexible, open IT solutions that deliver immediate value. HDS was among the first vendors to develop and provide true storage virtualization to the enterprise IT market more than 10 years ago. This enabled IT to flexibly integrate and manage multiple storage solutions within their environment. Today, our commitment to open, flexible IT environments extends to the entire data center as our enterprise customers and service providers develop public and private clouds.
What does this ultimately mean? First, let’s provide a background on OpenStack from the OpenStack website:
OpenStack is a global collaboration of developers and cloud computing technologists producing the ubiquitous open source cloud computing platform for public and private clouds. The project aims to deliver solutions for all types of clouds by being simple to implement, massively scalable, and feature rich. The technology consists of a series of interrelated projects delivering various components for a cloud infrastructure solution.
Who’s behind OpenStack? Founded by Rackspace Hosting and NASA, OpenStack has grown to be a global software community of developers collaborating on a standard and massively scalable open source cloud operating system. Our mission is to enable any organization to create and offer cloud computing services running on standard hardware.
Who uses OpenStack? Corporations, service providers, VARS, SMBs, researchers, and global data centers looking to deploy large-scale cloud deployments for private or public clouds leveraging the support and resulting technology of a global open source community.
Why open matters: All of the code for OpenStack is freely available under the Apache 2.0 license. Anyone can run it, build on it, or submit changes back to the project. We strongly believe that an open development model is the only way to foster badly-needed cloud standards, remove the fear of proprietary lock-in for cloud customers, and create a large ecosystem that spans cloud providers.
Our commitment to OpenStack also reflects our customers’ desires to build and manage cloud services, complete with reference architectures that are broadly supported. Our customers ask us to support OpenStack in our solutions, in addition to other similar solutions that we have with partners like Microsoft, VMware, etc. OpenStack is not just strategic for our customers – it is strategic to the evolution of the data center.
In addition, Hitachi Data Systems has a long history of being an active part of Open Source community. As an Open Source Development Labs (OSDL) sponsor, we have provided a development environment for open source projects whose purpose is to make Linux enterprise ready. We also participate in performance and reliability assessment and development of tools for Linux and open source middleware. Simply put – our commitment to OpenStack is an extension of our commitment to the Open Source community.
As the technology evangelist at HDS, Greg Knieriemen has the visibility to our broad current offerings as well as the ability to influence our future direction. His reflections on OpenStack:
Our sponsorship of OpenStack is part of a broader strategy at HDS to provide our customers with the greatest flexibility and options to build fully integrated, cost-effective public and private cloud stacks that complement the OpenStack partner ecosystem. Our participation with OpenStack is not just opportunistic. It is strategic to our vision of open innovation that ultimately enables and empowers IT to also drive innovation and value within their environments.
In one of my earlier blog posts, I mentioned that OpenStack is one of my areas of focus. As a chief technologist for the HDS Intelligent Platforms portfolio, I collaborate closely with various teams to define near term and longterm goals for OpenStack and how it fits with our overall infrastructure cloud vision. In addition to becoming a member of the OpenStack foundation, we also have Cinder volume driver for Hitachi Unified Storage platform. This will be available in the OpenStack Havana release. On June 14, 2013, we also announced the support of Hitachi servers in Red Hat’s OpenStack ecosystem. We continue to work on expanding our support to a broader set of OpenStack components. As we embark our journey in this space, we look forward to continue to contribute to the community and collaborate with fellow members of the OpenStack Foundation.
Dynamic Adaptation Knowledge Of Tiered Assets
So that’s a totally bogus title to this blog, but it does describe a “made up” acronym for a new mainframe product that we announced yesterday. We all have code names for products under development, and this was no exception. Code names come about for security reasons during product development. What was really announced was “Hitachi Tiered Storage Manager for Mainframe.”
Essentially what we have done is to integrate our Hitachi Dynamic Tiering (HDT) into the z/OS environment in support of System Managed Storage (SMS). There are a whopping 3 storage vendors that play in the mainframe environment: HDS, EMC, and (obviously) IBM. And as of the end of 2012, HDS is the market share leader in terms of petabytes shipped. (Hu articulated this milestone quite well in his blog.) That’s a pretty impressive statistic considering that IBM owns the server side. That said, it’s a very tight race. IBM has their Easy Tier product and EMC has their FAST VP (please don’t ask me to expand on that acronym). Both products are trademarked and all three do dynami c tiering in a z/OS environment.
Here is where the comparisons end, since what we have done is integrate our tiering into the SMS constructs, most notably the Storage Group. What we do is allow our z/OS customers to define tiering policies that can be applied to the Storage Group level. Do you want 2 tiers or 3? Do you want data “pinned” in a particular tier? These and many more options are available for simple and straightforward management of a complex z/OS environment that allows the storage system to move data to its appropriate tier. Pretty slick and unique to the z/OS environment.
Tonight we’re having a celebration called “Dinner After Kicking Out The Acronym.” Get it?
Anyway, check out the press release.
Big Data Survey Results– Getting the most out of Big Data
Last month, Hitachi Data Systems announced the results of a survey which contained some interesting, but not entirely surprising, results about how companies are leveraging big data solutions. Here’s a quick summary of what the survey of 200 companies with 1,000 employees or more revealed:
- 75% of UK organizations are currently investing in big data analytics, and of these, 80% are deploying solutions
- 69% of organizations investing in big data analytics do not have infrastructure in place to extract insights from their data sets in real-time which is preventing accurate data analysis
- 74% of respondents who have deployed a big data analytics solution can analyze only up to 50% of their data sets, and 33% of those with a deployed solution said they couldn’t analyze both structured and unstructured data
- 53% of IT decision-makers surveyed agree that organizations are at risk of making business-critical decisions based on old or inaccurate data
- Finally, and here’s the kicker, 60% of CIOs feel they are unable to extract the full value of their information
While the survey focused on UK companies, these results are consistent with the anecdotal stories I’m hearing from organizations globally. The good news here is that CIOs are realizing and strategically reacting to the opportunity of big data analytics. Thankfully, it appears that we’ve moved passed the cynicism of technology buzz words into actual adoption. But the survey does expose a gap in how technologists are leveraging legacy, disparate technologies and data sets.While these gaps are generally understood, there’s a fair amount of concern on how organizations move past the gaps, particularly real-time analytics, into business opportunity.My colleague Bob Plumridge, chief technology officer (EMEA), cites four recommendations for organizations. He outlines everything from breaking down the technology barriers to developing effective real-time data analytics:
- Evaluate your existing infrastructure: How information is stored can directly impact an organization’s ability to extract meaning in real time. Exploiting big data requires structured information management before an organization’s data starts to show true business value
- Take a strategic approach to storage: A more strategic approach to data storage will allow organizations to better utilize the data, which in turn can lead to better business performance. Organizations with effective storage strategies – that can capture data efficiently and mine it well in the first place – find it easier to unlock equity from that data. These are the organizations that find it the easiest to improve business efficiency, performance, customer service, and competitiveness
- Use velocity as a key enabler for big data input and output: Organizations need high velocity storage as a core part of their infrastructure to enable them to extract real value from their information and act on it in real time
- Deploy high-performance systems: Hitachi Data Systems offers the industry’s most scalable, high performance solutions, enabling a wide array of big data workloads to be completed faster and more cost-efficiently
Lastly, IT transformations like those occurring around big data are not just about the technology – there’s a significant “people” element that cannot be underestimated. As IT departments transition from being service-oriented cost centers into a core strategic business enablement function, every role within the IT department will evolve as well.
About the research
Hitachi Data Systems commissioned independent research agency, Vanson Bourne, to survey 200 CIOs and IT decision-makers in the UK during May 2013. The survey targeted organizations with over 1,000 employees.
Hitachi Information Forum: London
The Hitachi Information Forum in London last month was an overwhelming success. With more than 300 attendees and 20 sessions ranging from IT economics to customer case studies of private cloud deployments and converged infrastructures, the forum focused on practical applications with an eye on the value IT departments are bringing to their companies.
I had the privilege of moderating a panel discussion on big data. With the time given, we explored the skepticism of big data and how the role of IT will evolve as technologies evolve to give business value to retained data. Joining me on the panel were Chris Kranz, practice lead of Kelway, an HDS channel partner, and from Hitachi Data Systems Bob Plumridge, CTO, EMEA, David Merrill, chief economist, and Harry Zimmer, senior director, Global Competitive Marketing,
The big data panel discussion was recorded and can be viewed below.
Are You Ready for Microsoft Private Cloud?
This week’s “Maximize IT” announcement outlines HDS solutions that bring tangible advantages to your business by delivering features like flash acceleration with our pre-configured solutions for Microsoft.
Earlier this week, we announced a joint initiativewith Microsoft and our channel and systems integrator partners for Cloud OS. Now is a great time to take a look at Hitachi Unified Compute Platform Select for Microsoft Private Cloud, which features Hitachi Compute Blades, Hitachi Unified Storage VM, best-of-breed networking featuring architectures that are designed and optimized for Microsoft Windows Server 2012 and Hyper-V V3 Technology and tightly integrated with System Center 2012 SP1 management.
Our CEO Jack Domme was featured at Microsoft’s Worldwide Partner Conference, which kicked off July 7th in Houston, Texas. See the keynote address by Microsoft CVP, Takeshi Numoto, which reinforced our joint participation with Microsoft in the Cloud OS accelerate program.
It’s a great time to add HDS to the top of your IT vendor list, especially for heavily virtualized environments designed for Hyper-V V3 technology. Hitachi has the right solutions that can have you up and running in days instead of months.
Converged Infrastructure and the SDDC
With all the talk of Software-Defined Data Center (SDDC) and how converged infrastructure and automation enables the SDDC, it’s becoming increasingly challenging for IT decision makers to understand the path to get there. Last week we discussed how choosing the right converged infrastructure can help to enable pieces of the SDDC through the powerful combination of converged compute, network and storage along with orchestration software to manage it all. This is very close to and consistent with VMware’s definition of the SDDC: “The Software-Defined Data Center is a unified data center platform that provides unprecedented automation, flexibility, and efficiency to transform the way you deliver IT.”
However, with so many choices of converged infrastructure available in the marketplace today, and more coming online, end users must choose wisely to ensure they’re building a foundation on which to build their SDDC. It is one thing to simply converge the hardware components of the infrastructure; it is something quite different to unify those components through software and enable true transformation of that infrastructure.
We posed a few questions last week as well that customers should consider when looking to deploy a new converged infrastructure, or perhaps evaluating the solution they’re already deployed:
- Can the converged solution support a range of operating systems and hypervisors or bare metal servers while still ensuring a high degree of integration and streamlined management?
- Can the converged solution preserve existing IT investments while still enabling those investments to participate in a newly deployed solution going forward?
- Can you manage your entire data center with multiple implementations of a converged solution from a single console to help reduce costs and complexity?
These questions are but the tip of the iceberg when evaluating a converged infrastructure but they have something critical in common – answering them in the positive is all possible through orchestration software. The Hitachi family of converged infrastructure solutions, called Hitachi Unified Compute Platform, offers the best-of-breed compute, network and storage hardware unified in a platform that can help reduce cost and complexity, accelerate time to deployment of new applications and form the foundation of an SDDC.
For example, Hitachi UCP Pro for VMware vSphere already enables customers to virtualize more than 100 3rd party storage systems through our unique storage virtualization capabilities in the Hitachi VSP and HUS VM platforms. This protects customers’ existing investments while allowing these systems to participate in a newly-deployed converged infrastructure solution.
Something else, however, truly provides the “special sauce” within UCP Pro for VMware vSphere, and that is our orchestration software, UCP Director. UCP Director enables the pooling, aggregation, management, provisioning and monitoring of these hardware assets as a single unified resource, enabling customers to deliver IT-as-a-Service within their own organizations, or as a service provider to other end users. UCP Director optimizes our best-of-breed converged stack for the cloud era, and is evolving rapidly to offer compelling new features and functionality.
Among many new features, version 3.0 of UCP Director coming this fall, will enable full bare metal support with VMware vSphere 5.5 which is scheduled to be released in 2013. This will enable UCP Pro to run multiple operating systems or hypervisors simultaneously and provide server, network and storage provisioning services in a consistent manner. Another new feature in UCP Director v3.0 is called “Datacenter Director” which enables customers to manage multiple UCP Pro systems from a single instance of UCP Director even when the systems are geographically dispersed. Datacenter Director will also allow customers to set up a disaster recovery solution across multiple UCP Pro systems. By leveraging Hitachi Unified Replication (HUR) and Hitachi TrueCopy you can enable a robust DR and BC solution and perform VM recovery across your primary and secondary UCP Pro system
UCP Pro for VMware vSphere and UCP Director already set Hitachi converged infrastructure solutions apart from competing offerings, and we’re about to widen that gap even more significantly on the road to enabling our customers to realize their own SDDC.
Unified: Choose Wisely
There’s a lot of noise out there about ways to increase performance and cost savings. But when you dig into it, there is usually little to quantify or substantiate the claims. It’s all typically anecdotal, with a little bit of hand waving. That’s why we want you to be able to ask the right questions. Questions related to increasing performance, capacity efficiency and cost savings (i.e., better economics).
Performance and Scalability
Performance consists of more than just IOPS, and scalability is about more than just capacity growth. When adding more capacity, does your performance adjust to handle the additional data? When you add another node in the cluster, does your performance grow 100% per node, or do you lose something in the process? Are you getting a symmetric active-active failover capability, or plain active-passive? Do you really have to buy another file server to address the maximum capacity? And how much usable capacity are you gaining anyway? You need to know that your storage solution will scale with you not only in capacity, but in performance to store and access the additional data that goes into those disks. And that you should go beyond the IOPS performance comparison, and start asking about throughput and simultaneous open connections and files.
Let’s face it, we all want flash memory. The more the better for performance right? The harsh reality is that flash disks and SSDs are much more expensive per GB than hard disks. So you need “flash-awareness” in your systems to achieve the full potential of flash. Real estate is the key. Since flash is expensive, the name of the game is to squeeze as much data as possible into that flash space. You’re thinking dedupe should work well here. And you’d be right. But what is sometimes missed is what you lose in order to make it work right. For example, when dedupe is turned on, will it not interfere with the file sharing workload? Can it run anytime and not have to be scheduled at off hours? Can you just set it and forget it without having to worry about management?
TCO / Cost Savings
IT managers are more business savvy than ever before. But it’s easy to see why hardware, software and maintenance costs are the only expenses considered. There is a bigger picture. As pointed out above, try to understand how capacity growth and performance are interrelated at scale. Is storage capacity used as efficiently as possible so as to defer additional storage purchases? What about power, cooling and space costs? Can the file server or NAS solution consolidate enough to reduce these significantly? Backup costs should be considered – especially the additional capacity and time window objectives. And server virtualization integration for VMs and VDI is critical for infrastructure. Can the solution help reduce costs for virtualized environments over time? It should. More than ever before, it’s about doing more with less, for less.
Sometimes it seems like a secret decoder ring is needed to understand what computer and storage vendors are saying. But by learning to ask the right questions, you’ll quickly be able to uncover the truth and make the best informed decisions!
Dynamic Provisioning and Oracle ASM
I’m not a storage performance expert. I’m not a database administrator (DBA). But if you ask me ANY question about performance, I’ll give you the very same answer they do: “it depends”.
Having said that, many, many (oh, so many) years ago I was a DBA and spent some time working with performance groups (read: I’m not a complete moron when it comes to the subject; perhaps many others, though).
Why am I bringing this up? Well, one subject I do talk a lot about is how our offering for thin provisioning called Hitachi Dynamic Provisioning (HDP)is built on an architecture called “data dispersion” or “wide striping” that is designed to dramatically improve performance and throughput in cache-unfriendly workloads like databases, email systems, etc. It does this by equally “dispersing” all of the data from a LUN across all of the drives within an HDP pool. When an application needs to access something from that last remaining mechanical device in the data center, you can have literally dozens or hundreds of these mechanical things serving data up into cache simultaneously.
That’s the beauty of this architecture, and it is unique to HDS. No metadata; no additional overhead.
The message is very simple: you will either get a performance improvement, or worst case, you’ll stay where you are. Never a degradation (given best practices, etc.). Basically, the more cache unfriendly the workload is, the larger the improvement will be.
We do wide striping at the physical level and it has worked quite well for us. Oracle ASM also does wide striping, but at a logical level. How do the two interact? Good question.
A couple of months ago a customer told me they would never put Oracle ASM in an HDP pool because the logical vs. physical striping would conflict, resulting in a performance degradation. That intrigued me. Then I heard a similar comment from one of our integrators, which made me think of “braiding.” Braiding is the notion that multiple forms of wide striping tend to complement each other.
I posed this question to the larger set of HDS experts on whether or not performance would improve or not, and the resounding conclusion was that it did.
So going back to my performance roots (shallow, I know), I would have to answer “it depends” when queried on whether you should place Oracle ASM inside an HDP pool. But do not rule it out. The overwhelming responses were that they complemented each other rather than conflicted.
Innovating Openly in the HDS Community
On June 10th we debuted a forum to facilitate peer-to-peer interactions between HDS/Hitachi personnel, our customers, our partners and those interested in seeing what Hitachi can bring to the table. While other company sponsored business social networks are out there already, we’re approaching things differently in that we don’t merely want to broadcast our future; rather,co-create it with those who are motivated to seize it with us. In the spirit of that intention we’ve created the HDS Innovation Center designed to expose our thinking, facilitate engagement, and set the tone for the things that inspire the next…
We’re already witnessing great discourse in the community including deep discussions around the definition of Open Innovation, what sorts of programming models are needed for the future of Advanced Analytics, and what kinds of trends we’re witnessing in the industry. We’ve also recently named our first MVP, Cris Danci, an employee of an HDS partner who has hung right in there with me personally on some pretty deep threads.
Personally, I’m super excited to engage with our user and partner base, and I’ll see all of you in the Community.
Are we there yet?
(Note to reader: the following blog entry contains all sorts of hidden humour, overt sarcasm, complex technical jargon, and one obscure Disney reference…reader discretion is advised)
Almost half my life – almost 20 years as a career technologist – I have made tons of mistakes, combining unrecognized inexperience with brash and insolent attitude…the perfect storm. Some mistakes were small, minor indecisions or mistypes easily corrected. But some were major, having far reaching impacts to both people and profitability, much less easily corrected, and some permanently marked as complete and total unrecoverable failures. In fairness, the ratio of good to bad was at minimum 50/50 and more likely falling closer to a solid 60/40 split, which elevated me through organizations where I could do much more serious damage.
Fortunately, my entire career has been as a client for Hitachi Data Systems, honing my craft as a political strategist, a financial analyst and a risk mitigator. And on occasion, where budget allowed, an innovator. I plan to use this experience, and the learning from all my mistakes, to help my new customers make better decisions, grow their business, and fail a lot less.
In my past role as an innovator, within the ratio of good, the last few years were focused on understanding the implication and implementation of the four big “nexus of forces.” More specifically, how the data collected from a much more MOBILE and autonomously controlled client base who demand a much more intimate SOCIAL interaction and relationship with my organization, could be managed with the appropriate amount of CLOUD elasticity and automation to produce INFORMATION enlightened with even more business insight to continuously grow and improve the customer experience without impacting the SECURITY and privacy of any one individual. My organization kept asking me “So…what are you going to do about it?” with the extra pressure of “Are we there yet?” It was a tall order.
The strategy was formed on my travels to the Emerald city, where the great and powerful would deliver me the solution:
- What’s needed is a foundation – a brain, a highly scalable and virtualized INFRASTRUCTURE CLOUD providing on-demand access to my heterogeneous network, compute and storage environment; both modular and CONVERGED. This environment needs to be managed by a single pane-of-glass with templated, fully automated deployment and maintenance capabilities. It needs to discover equipment and forecast both problems before they occur, and opportunities to grow before it becomes an emergency.
- What’s needed is to breathe a new life into the organization by elevating the notoriously sidelined asset of data into being the heart of the portfolio. A new CONTENT CLOUD will need to differentiate the lifecycle and lifespan of structured versus unstructured data. New capabilities of data discovery, unstructured indexing, and archiving of these precious resources must be introduced to incorporate the scalability necessary once the data grows exponentially.
- What’s needed is the comprehension of the regulatory, contractual, and privacy concerns of our clients, and the courage to create new insight from the data we have collected to enrich their experience. The new INFORMATION CLOUD must envelope the integration concerns and discipline of reporting, business intelligence, business analytics where there might be both a lot of data, or even big data.
The wizards of HDS already understood the need, and delivered the INTEGRATED STRATEGY with all the enthusiasm of any person wearing bright red shoes, we all knew what needed to happen next….teach others.
I hope to spend the next 20 years learning from the brilliants minds of HDS, helping to communicate and practice real integration with our clients. And on the way, use the experience of all the mistakes…to help our clients innovate.
In hindsight, I wouldn’t have changed a thing. It’s off to work I go…
Flash Storage: Built For the Real World
In my prior post, Flash Storage: Choose Wisely, I raised a number of questions that companies should consider as they look to add flash storage to their environments.
I’d like to dig a bit deeper in a topic related to a number of the questions that were raised. Some of those questions asked about whether there was only one way to develop a flash system, whether new management would be required when adding flash storage, the use/need for all-flash versions of storage systems vs. tiered systems and if new storage arrays offer the high-availability built into the systems users have become accustomed to.
All of these questions are aligned with another industry topic: what is or isn’t the “right way” to develop a flash storage system. Often customers are told that flash storage systems need to be “built from the ground up” for flash.
To cut through the noise, let’s focus on the main – and relatively standard – parts of storage systems: the media, the software and the processing. On top of that, we can add management software, which users will leverage to set-up, control and change their storage infrastructure with.
While possibly an over-simplification, it does a reasonable job of covering the key functional areas of a storage system. Analyzing those categories (media, software, processing and management) demonstrates that nothing inherent to a storage system demands it be an entirely new “built for flash” design.
When HDS recognized the growing importance of flash, we didn’t start down the path of building or acquiring an entirely new line of storage arrays that were “purpose built” for flash. We instead focused first on optimizing critical storage infrastructure components, namely the media and storage system software.
HDS began designing flash media that was built for enterprise requirements, now called Hitachi Accelerated Flash storage. Work also began on tuning our Base Operating System software to optimize system resources and processes when deployed with flash, and also offering flash-specific software enhancements we call flash acceleration.
Hitachi Accelerated Flash storage has transformed expectations on what flash media should be in the enterprise. Without losing the serviceability of a drive/module form factor, we introduced massive improvements in per-module capacity (currently 1.6TB, which will double later this year) and per-module performance potential (up to approximately 100K IOPS each). These are not minor jumps.
But performance and capacity were only two of the improvements in Hitachi Accelerated Flash. The HDS-developed, multi-core embedded flash controller enables enterprise-levels of durability while still leveraging cost-effective MLC memory. It does this through features such as write avoidance, write leveling and advanced failure prediction in close cooperation with the HDS storage that it runs in. It speeds formatting. It manages garbage collection in ways that help prevent the dreaded “write cliff” where commodity drive performance falls off dramatically as free space is used up. This improvement ensures we can deliver far more predictable performance in real world environments – a proven HDS trait.
In fact, we feel we developed the BEST flash storage media implementation for the enterprise. Period.
Our software tuning results are equally impressive. Between Base Operating System optimization and flash acceleration software innovation, over 30 significant changes improve the way our system resources are leveraged. New “express” IO processing speeds internal system communications. Cache access and allocation is optimized. Support for more simultaneous threads speeds information delivery and responsiveness throughout the system.
Since these software modifications happen at the core level within the system, the changes are all transparent to the functional software that sits above it and can be enabled non-disruptively. Remote replication, file deduplication, dynamic provisioning and tiering all work the same as always. Only faster.
We innovated on these two functional areas of a storage system (flash media and system software) without having to throw away all the work that has made our products great already.
Would leveraging commodity drives or a hard-to-service “chips on a board” design give customers better density, performance or availability than our systems offer?
Would writing entirely new storage operating systems that lack the trusted functionality and availability features already available and proven in our systems really be more valuable than optimizing that code for maximum flash performance?
I think not.
And with well over 100 patents granted/pending on our flash hardware and software flash acceleration, I think we have the innovation engine to stand behind that belief.
As for the components of a storage system – that still leaves processing and management untouched. Given the depth at which we’ve covered the first two topics, I won’t attempt to delve deeply into them during this post. To address them quickly though, we see no reason a total redesign to either of these areas helps customers better exploit flash.
HDS leverages Intel processors, like most other systems do today, for most internal system work and designs specialized ASICs and FGPAs to offload certain specific storage functions within the system. This remains a successful solution to delivering industry-leading performance.
And when you start from a proven system, you fit within a proven management paradigm. All our flash storage systems are fully supported within the Hitachi Command Suite framework, and you can expect that any new systems would be as well.
If you are considering a flash system that’s managed differently than the rest of your environment, or even the rest of that vendor’s portfolio, ask yourself if the benefits of this newly designed product are real and compelling enough to add this layer of complication.
When you consider HDS, you’ll see that this will not be an issue. We’re not going to try to convince you to add an unproven system to your environment and deploy new management tools for a niche use case. We’ll offer you all the performance, proven availability, unified management and software features that are proven in real world environments, long before they get to you.
Yes, flash storage is different. It requires engineering optimization to fully leverage its performance potential. HDS has done that engineering work.
And we’ll have some new systems coming soon that will dispel any myths that optimized systems cannot set the industry’s performance benchmark.
Depending on your perspective, we may or may not have systems that are “built for flash” – but we most certainly have flash systems available today that are built to help our customers in real world environments.
There’s nothing more optimized than that.
Converged: Choose Wisely
The enterprise data center is a crazy place, complete with rows of metal and wires. It’s noisy with hot and cold aisles and is constantly reinventing itself. To many organizations, it is also the heartbeat of a business, increasingly vital to its success, providing a competitive advantage for those that select the right technologies and solutions and allowing companies to leap frog the competition. In a short period, internet giants have successfully transformed business models by choosing IT solutions that empower them to provide consumer services not available before. We are now witnessing a similar transformation in enterprise data centers that dramatically simplifies the way IT acquires and uses technologies. A growing number of companies are adopting converged infrastructures to cut the time, resources and cost of deployment by outsourcing the validation and integration of end-to-end infrastructure to vendors.
With a new generation of converged infrastructure evolving, successful IT professionals are not just satisfied with what happens before deployment but also want to exploit solutions that cut down the data center inefficiencies post-installation and fetch the highest ROI. Everyday tasks like deploying new VMs, provisioning storage and network capacity, performance tuning, scaling, upgrading firmware, protecting data, monitoring the health of the infrastructure, and more can be laborious, error-prone and time-consuming. This is where converged systems diverge. Best-of-breed solutions offer a single-pane-of-glass across infrastructure to automate these manual functions while optimizing the scalability and performance and dramatically increasing the value to data centers and their respective businesses.
IDC estimates 54.5% growth in converged infrastructure through 2016 but organizations that are leaders in their respective industries will be selective about IT acquisition strategies for private/hybrid clouds or specific core applications. They will be wise about transitioning to a converged infrastructure that not only cuts down the time to deployment but also automates day-to-day activities for a wide range of IT services. They will spend less before and after deployment while delivering higher value to their businesses just like the successful internet companies did.
IT and business decision makers looking to either deploy new converged infrastructure solutions, or wishing to evaluate the solutions they’ve already deployed should consider a few key questions:
- Can the converged solution support a range of operating systems and hypervisors or bare metal servers while still ensuring a high degree of integration and streamlined management?
- Can the converged solution preserve existing IT investments while still enabling those investments to participate in a newly deployed solution going forward?
- Can you manage your entire datacenter with multiple implementations of a converged solution from a single console to help reduce costs and complexity?
These are by no means easy points to consider, but none the less are critical when considering which converged solution will best meet your needs today and into the future. Stay tuned. HDS is coming to the table to address these challenges head on in the coming weeks.
Hitachi Information Forum in Mainz, Germany – Big Data Panel
Earlier this month, the 5th annual Germany Hitachi Information Forum was held in Mainz. With over 300 attendees, 20 presentations, hands-on breakout sessions and a panel discussion on big data, the event was an overwhelming success.
As a pretext to the panel discussion on big data I moderated, I referenced The Economist’s Kenneth Cukier’s example of Google Translate as an example of how the scale and depth of data distinguishes big data from the data analytics of yesterday. Jürgen Krebs, director of Business Development and Marketing of HDS Germany, and Bob Plumridge, chief technology officer of HDS EMEA, joined me on the panel to discuss their views and real world examples of big data. We also discussed how object storage impacts big data, how the role of IT leaders will evolve as big data drives business decisions, as well as the impact of SAP HANA adoption.
You can view the Hitachi Information Forum panel discussion below.
Here’s Your “Get Out of Mail Jail” Card
As IT departments struggle with growing the volumes and velocity of data, and especially the many copies of data that they generate, one of the biggest headaches is email. Mailboxes get crammed with copies of messages and attachments, not to mention all the trivial and junk email that we receive.
The “fix” in most organizations is to limit the size of users’ mailboxes: once you reach a certain size limit, you get a warning; then a little later you get locked out of being able to send email until you delete enough old email to get you under the threshold. Many of us call this situation “mail jail.”
Of course, this approach does not serve the needs of either the user or the organization. The user is pulled away from doing productive work, unable to respond to critical messages, while they spend time sifting through their mailbox to decide what to delete. And in this process, users often delete email records that may be needed by the business at a later time.
So how do we solve this problem? We need to set policies for email retention and deletion that meet business and regulatory requirements. Then we need to move older, static messages out of the email database and into a secure, scalable, searchable, and cost-effective active archive content storage platform. And it needs to happen automatically without affecting users.
Hitachi Content Platform (HCP) is the perfect repository for archived email messages, attachments and other objects. It grows as you grow, it self-protects the data you send to it, and it has powerful metadata index and search capabilities. Recalling archived data, including email objects, is a breeze.
But how do we get the data out of the email database and into HCP, and do it in a way that meets business policies? This week, we solved that problem with the release of a new version of Hitachi Data Instance Manager (HDIM), that now includes tight integration with HCP for archiving email objects from Microsoft® Exchange environments. This release follows the vision that was laid out in a press release in April.
HDIM includes a highly advanced, easy-to-use whiteboarding approach to creating policies for backup, RPO, replication, retention, security and archiving. Unlike other email archive products, HDIM now has the special sauce for exposing exchange metadata necessary to take full advantage of the capabilities of Hitachi Content Platform.
HDIM gives you the option of copying email objects to HCP, or leaving a small stub file in the Exchange Database (EDB) to point to where the object is now stored, or deleting it completely from the EDB. It moves the data through an agentless connection and puts almost no load on the Exchange Server. Objects can be retrieved either from the HDIM restore interface, or directly from HCP.
Through the deployment of HDIM our customers can easily deploy the ABC approach of managing data to significantly improve backup performance and effectively drive down management costs with a best practice ILM for MS exchange. ABC = Archive More, Backup Less, Consolidate the Rest.
If your business and your employees are suffering from mail jail syndrome, give HDS or one of our business partners a call. We have a new solution that can help keep you out of jail – literally and figuratively.
Please follow me on Twitter @rcvining
Flash Storage: Choose Wisely
Flash is undeniably a hot-topic in today’s IT landscape and flash storage deployments are growing rapidly across a number of important application environments. Flash storage is on the minds of customers, analysts, the press and vendors of all shapes and sizes, as everyone is looking for ways to capitalize on this important new storage option.
This focus means that it’s become rather noisy in the market with claims of superiority and the finding of the flash “Holy Grail” ringing out near constantly. This in turn leads to performance claims where benchmarks are contorted in every conceivable way possible to eke out new claims of performance leadership. All of this noise can make it harder on a customer to separate fact from fiction and truly understand what approach can best help them address their current needs and future directions. Much like the scene in Indiana Jones and the Last Crusade where there are a number of possible “Grails” in front of Dr. Jones, there are just as many flash options.
As my colleague Hu Yoshida noted last November with the introduction of Hitachi Accelerated Flash storage, HDS has been looking for ways to take flash to higher levels of performance, durability, capacity and cost, which leads to improvements in overall productivity. We ship many petabytes of flash capacity each quarter to our customers. With that in mind, we want to share a few thoughts on where flash is, and where flash is going. Since there are too many flash topics to cover in one blog post, this is the first in a series of flash-focused blog topics which will help in choosing flash the right way today.
Let’s start with the basics. Given all the chatter, customers need to be able to answer the following questions for themselves when considering a flash purchase.
- Does the provider offer a suite of flash capabilities (tiered systems, all flash systems, server flash, no flash, etc.) or will they attempt to convince you that their approach is the only one that works?
- Will I need to change my management schema to deal with a new flash storage system or will it work within the tools I already have (or might add) from this vendor?
- Can I get an all-flash version of this vendor’s technology? Alternatively, if I buy a high-performance, all-flash array today, can I add disk drives and tiering capabilities later if my needs change?
- Can I add both block and file storage to the flash system? Does it offer unified management across those environments?
- Will I get the same level of enterprise availability with flash as I get with my traditional hard-drive based storage? (e.g., non-disruptive upgrades, predictive sparing, etc.)
- Since the applications that need high-performance tend to be those that need the best protection, can I get built-in replication? If so, do I need to learn new replication tools for flash or will my tools work seamlessly with my disk-based storage systems?
- How will this vendor’s technology help me most efficiently use my flash investment – does it focus on de-duplication, compression or tiering to drive down costs? (Can I use these technologies in secure, encrypted environments?)
- Can I virtualize my current storage assets behind my new flash storage system to extend the useful life of those assets and accelerate their performance?
- Do I trust this vendor will be focused on this same technology direction 12 months from now? Will this vendor even be around 18 months from now?
- Does my storage warranty ensure that I am covered for any solid-state memory issues despite how active my storage system is?
- Can the vendor articulate a strategy of tiering across the infrastructure and flash deployment options?
- Can this vendor’s flash strategy help me across all my environments – including virtualized servers and mainframes?
- How does a vendor protect and service the solid state memory within its systems? If a chip fails, is my array in need of swap or can I swap an SSD/flash module?
You might be wondering if this was a self-serving list of questions. Maybe a bit. But even so, it is a relevant list of more than just a few of the questions many of our customers are asking.
I can also assure you that somewhere in that list there’s at least one question that will make every vendor a bit uneasy, including HDS. Also, I’m sure I missed some relevant questions too, and I’d be interested to hear which you think should have made the list. I’ll be sharing insight into each of these questions in a few upcoming blog posts. And many we’ll cover off on in upcoming announcements, so stay tuned.
Hopefully a list like this can help customers cut through some of the slick marketing slides and get to the facts about the choices available in the market. In keeping with the Grail Knight’s counsel, “choose wisely, for while the true Grail will bring you life, the false Grail will take it from you.”
Hitachi Data Systems Introduction
This is my first blog since recently joining Hitachi Data Systems as senior product marketing manager for data protection, and I’d like to share why I decided to make this career move, and why I’m really excited to be blogging for HDS.
Many technology marketers like to draw an analogy between data in the IT world and blood in the human body. It makes sense because, if you lose your data / blood, or if it stops flowing, the organization / body will likely die.
This analogy, however, fails to recognize one important difference between IT and nature: the human body has a fantastic ability to regulate the amount of blood it contains; as new blood cells are created, old ones die. The alternative would be reminiscent of several horror and sci-fi movies you may have seen.
It doesn’t work that way in IT, though; new data is created much faster than older data is expired, deleted or destroyed. Consider this: storage capacity in personal computers of the 1980s was expressed in kilobytes; in the 1990s we were talking about megabytes; then gigabytes in the 2000s; and now many of us have at least a terabyte on our desks, and we’re not even half way into the 2010s. That’s roughly a 1000x increase per decade. Does anyone really doubt that we’ll need petabytes of capacity by 2020 for our computer, tablet, smartphone, wristwatch, glasses, car, cloud, or whatever comes after cloud?
Of course, much of this new data is created from new sources (online transactions, social networking, smart devices, sensors, etc.) and from improvements in older sources, such higher resolution scientific equipment, video and imaging. But most of the new data is really just copies of the other data. We have backup copies for operational recovery, off-site copies for disaster recovery, copies for development & test, copies for long term archive, copies for off-line analysis, and on and on. We’re drowning in the copies of our data.
The data storage and management vendor community has been developing technologies to try to contain this growth – such as data compression, data deduplication, automated archive and deletion processes, snapshots, etc. for years. But as we continue to see the amount of data grow we’ll need better solutions, especially for managing and limiting the number of copies of data. Have you started to hear the word ‘exabytes’ yet?
My job as senior product marketing manager is to help build the data protection and instance management story that shows how the implementation of this new capability will result in exceptional business outcomes for our customers. The journey, for me has just begun and I hope you’ll come along for the ride. Follow me on Twitter @rcvining.
About the author: Rich Vining has worked in the secondary storage industry for more than 20 years, helping to shape many of the solutions that protect and retain data. The contents of this blog are his own, and do not necessarily reflect the positions of Hitachi Data Systems or any
Is Storage at Fault for this Application Performance Problem?
By Richard Jew and David Foster, Hitachi Data Systems
As storage infrastructures continue to grow in size and in complexity, significant challenges arise to properly monitor storage capacity growth and performance service levels delivered to critical business applications. Since storage systems are often viewed with a level of unknown within the IT environment, i.e. “black box”, storage is often the first to be blamed when an unexplained application performance issue arises. This often prompts storage administrators to scramble for detailed storage performance statistics in an attempt to prove storage is not at fault. Typically, this involves manual tracking processes that may not be completely accurate nor properly scaled in an enterprise environment. In addition, as enterprises are constantly evolving, manually maintaining and monitoring the pertinent storage (supply) to business application (consumer) relationships can be taxing. Enter storage analytics, where storage service level agreements and objectives that include detailed measurements of storage capacity and performance can be defined and measured for key business applications. With each business application having differing service level needs for storage, ensuring each of your critical business applications are meeting their required storage service levels is a challenging task that storage analytics can help solve.
We recently sat down with one of our software product manager at Hitachi Data Systems, Manu Batra, to find out more about storage analytics and what Hitachi Data Systems is doing in this area.
What is a service level agreement (SLA) and service level objective (SLO) for storage?
Manu: In storage terms, I would define a SLA as the service levels my business applications are expecting from the storage resources they rely upon. SLOs are specific metrics, such as response times/latencies or utilized storage capacity, which together define the storage SLA for a particular business application.
What is the best way to measure the success of a storage SLA?
Manu: Essentially, a SLA would use both response time and data throughput metrics to verify that the storage infrastructure is delivering the required service levels to an application at any time. This is how you would typically define a SLA from the storage layer to any layer above it (server, application, etc.).
Now that we have defined a SLA and SLO for storage, it seems that they used to be utilized more in the service provider industry. So how is this relevant to an enterprise; could they use the same metrics as well?
Manu: As enterprises become larger and data centers grow beyond their original designs, it’s more complex for any enterprise to keep a comprehensive view across its entire infrastructure. They need quantifiable measurements that determine how their data center is doing today. Instead of looking at all aspects of the data center independently, time can be saved by looking at a set of key storage metrics that can be set, managed and optimized. SLOs provide customers a standard set of performance and capacity metrics that allows enterprises of all sizes to quantify a ‘Level of Happiness’ from their storage infrastructure.
What are some of the benefits enterprises receive by applying storage analytics to their infrastructure?
Manu: As mentioned earlier, a lot of time is spent analyzing just to ensure storage is not at fault for an application performance issue. By applying storage SLOs to key business applications, you are better equipped to quickly respond to application performance issues, maintain storage service levels for critical business applications, and you can simplify storage troubleshooting. Even in today’s growing enterprise storage environment, IT staff resources are limited and they need to continue focusing on saving money with their current infrastructure. So the question they must ask is “How can we maximize our existing storage resources as the data center’s capacity expands?” In addition, storage analytics provide greater visibility to CIOs, architects and storage administrators, allowing them to continuously check on the status of their infrastructure with a storage dashboard, get notified early on for any potential issues, and accurately measure storage capacity growth over time.
How does Hitachi Data Systems deliver storage analytics to its customers?
Manu: Hitachi Data Systems is deeply committed to helping customers manage and optimize their IT infrastructure. HDS has been delivering storage analytics capabilities for some time as well as continuous research focused on helping organizations manage the tremendous data growth in their data center. Hitachi Command Director software enables the concept of SLAs and SLOs to be applied to storage and helps customers adapt to ensure storage service level agreements for business applications are being met. It enables the data center to be properly measured beyond an operational view to a business-centric view to increase storage utilization and efficiency in order to better address critical business application needs.
RESTful APIs and Intelligent Platforms
We live in the age where Jennie Lamere, a 17-year-old high school student, has developed a browser plug-in that will block tweets with TV spoilers, thanks to REST APIs supported Twitter . Yet there is a significant gap between application programmability and infrastructure programmability.
On May 9th 2013, I attended the Cloud2020 Summit at SuperNAP. One of the hotly debated discussions was “Programmable Infrastructure”. (Check out Pete Johnson’s blog to get a glimpse of what was discussed). When I asked how far they think the infrastructure has come in terms of programmability, Randy Bias, co-founder & CTO at Cloudscaling, responded that the network is a mess, and storage is probably second worst. Imagine the kind of innovation we could see with the infrastructure platforms, if they were as programmable as applications are. As I ponder that, I keep coming back to RESTful APIs. I really believe that RESTful APIs are key in enabling the innovation around infrastructure.
In my previous post I highlighted two key characteristics for Intelligent Platforms that:
- can expand or shrink with the demand fluctuation
- can be programmed
In this blog I’ll explain the role of RESTful, by taking a storage use case.
Imagine you have an instance of an application with X amount of allocated storage
Say the storage demand for this application increased and you need an incremental storage of Y amount needed – otherwise your application would crash. Would you let the application crash and risk the consequences? Wouldn’t it be nice if your infrastructure was intelligent and could:
- monitor the storage utilization of the application
- set the storage thresholds and trigger an alarm when the utilization reaches that threshold
- request the underlying storage array to expand the capacity
This should be simple, but unfortunately it depends. It’s not that simple today because of the underlying components that make up the stack. In a simplified view, you have the application, VM, hypervisor, network, and the storage array. Different vendors provide different components of that stack. This is where the RESTful APIs become relevant.At the core of the RESTful API principles you will find:
- the application of engineering—generality to the component interfaces
- the stateless nature, that allows each interaction to be independent of the others
- the ability to introduce intermediary components to perform different operations like load balancing, security enforcement, or encapsulating legacy systems
We kept these aspects close to our design when we built the UCP Director. In the rest of this blog, I will explain how to automate the above use case and the role of RESTful API. More information about the specifics of UCP for VMware vSphere can be found here.
Because UCP Pro for VMware vSphere is built for VMware stack, the above use case can be accomplished by essentially bringing two sets of subsystems together – VMware vCenter and UCP Director.
Read here for instructions on how to accomplish the VMware side of the operations.
Here is how the RESTful API looks from UCP Director side. All you need to do is to make a POST call to UCP Director and you are good to go.
Request header nicely breaks down the resources – first it provides identifier for the storage subsystems, followed by the identifier for the volume itself that you want to expand. The body of the request has the size of the new volume, after expanding the original volume. Just like the body of the request, the body of response is also represented in JSON. This makes it easy to consume the response in a programmatic way. In the response, you will notice an entry called “JobId”. Using this JobId you can track the status of the job.
Beauty of the RESTful API from UCP Director is that tomorrow the exact same API can be consumed by non-VMware infrastructure. Just like Jennie found a way to block TV spoilers on her twitter feeds, RESTful API’s generality nature and its ability to introduce intermediary components enables customers to innovate around their the IT infrastructure to meet their unique needs.
Join Us at the HDS Information Forums
In June, we will begin our Hitachi Data Systems Information Forums around the globe and you are invited to join us for one of our most popular events. This year the forums begin with our largest venues in Mainz, Germany on June 5th and London, UK on June 13th.
As part of the Information Forums in Mainz and London, I’ll be moderating panel discussions on big data with HDS executives, which should be both entertaining and enlightening. We will cut through the hype and discuss practical big data applications today while taking a look to the future and how the scale of data retention will evolve to create unique business value.
By attending, you will gain meaningful insights and tactical tips for evolving your datacenter to gain the most efficiency and reduce expenses. In addition, you will also be among the first to hear about innovative solutions and strategies such as:
Maximize Your IT Advantage – Discover how to accelerate performance, control your data growth, and transform your IT environment to be more intelligent, efficient, and cloud-ready.
Cloud, Your Way – Learn how to improve agility and flexibility through various private, public and hybrid cloud solutions provided by HDS and our partners.
Capitalize on Big Data – See examples of innovative, big data solutions today that transform how you capture and capitalize on data.
#CloudAnywhere is Information Everywhere
Many of our customers prefer to use HDS cloud-ready technologies, solutions and services to create “content clouds” of their own. At the core of the content cloud is the Hitachi Content Platform (HCP) object store which, in addition to its metadata capabilities, provides an automated, backup-free Web 2.0 storage environment with robust security, vast scale, and flexible multitenancy. Support for Hitachi Virtual Storage Platform and Hitachi Unified Storage, a VMware version and the ability to tier data to almost any external storage, makes HCP a fully virtualized solution that builds on the investments customers have already made. Integration with Hitachi NAS provides easy access to vast amounts of content with only the most recent and relevant data in expensive, tier one storage.
But corporate entities at branch offices also require storage to get their work done. Their IT needs will typically be smaller, but their data needs will also continue to grow. To mitigate this situation, Hitachi Data Ingestor can be installed at these locations as a cache for frequently used data, making it locally available with the vast majority of the data stored at the core HCP and accessed via links.
We complete our content cloud with the incorporation of the new HCP Anywhere (announced today), our enterprise-ready, safe, secure file synchronization and sharing solution. HCP Anywhere is an enabler to mobile and remote employees, allowing them to collaborate and share files efficiently and securely, eliminating the need for them to use consumer clouds that can compromise security.
Finally, the picture is made complete by offering customers the means to offload data and content to a hosted cloud with offsite hosted cloud solutions that allow customers to reduce or eliminate additional capital costs and better control operational costs.
This tightly integrated portfolio, coupled with our new delivery model and partner programs enables our customers to deploy their own clouds, in their own ways allowing them to focus on how to take advantage of the transformation from traditional IT to cloud rather than trying to figure out how to get from where they are now to where they are going next.
#CloudAnywhere Extends IT Beyond the Datacenter
This morning, HDS announced a number of new cloud enabling technologies, delivery models and partner programs that enable our customers to extend their existing enterprise IT to embrace cloud solutions. This approach allows our customers to continue to take advantage of their existing infrastructure, consolidating and virtualizing it if need be, and then leverage one or more cloud-based approaches to extend their IT beyond the walls of their enterprise. In this way, they can deliver the services and information mobility their stakeholders need, from their own cloud, in their own way.
An exciting new solution is Hitachi Content Platform (HCP) Anywhere- the industry’s 1st integrated file sync and share designed for the enterprise and built by a single, well respected vendor. This integration-by-design puts HCP Anywhere ahead of the competition in security, ease of implementation, keeping IT in control. HCP Anywhere is:
- Secure: HCP Anywhere is an on-premise, private cloud solution that keeps data within the control and governance of corporate IT.
- Simple: Users download the HCP Anywhere client application from a portal or via the iTunes store, self-register their devices, and they are ready to go since it works just like other folders on their device.
- Smart: HCP Anywhere uses the backup, compression, single instancing, spin-down disk support, and metadata capabilities of HCP to create a highly efficient content sharing environment.
A new version of Hitachi Content Platform (HCP) plays an important and foundational role in the HDS cloud portfolio. With this announcement, HDS expands the robust capabilities of HCP as a cloud storage platform, featuring the most advanced metadata management in the market today. A more feature-rich metadata capability means HCP can better help automate operations, providing a strong foundation for a big data repository and helping find the right datasets for deeper analytics.Also featured are new hosted cloud services from HDS. These include Hitachi Cloud Services, an extension of the enterprise into a secure, robust off-premise hosted cloud managed by HDS, and the Hitachi Cloud Service Provider Program, a partner provided public cloud offering built on HDS cloud infrastructure and solutions.With today’s announcement, HDS delivers another milestone on its strategy for the cloud with solutions that first and foremost addresses what customers need. Our new and enhanced enabling technologies and cloud solutions help you:
- Choose the best possible solutions to address your needs, at your own pace, and in a way that makes sense for your business
- Solve the challenges you face today – and gives you a variety of new options to help manage the explosive growth of unstructured content and achieve the cost and flexibility benefits of the cloud by reducing CAPEX and OPEX
- Leverage cloud and file sharing technologies in ways that were never before possible. With these new offerings, you can be prepared for what’s coming next. (big data, bring your own device, next-generation file services, secure clouds, distributed IT, metadata-driven automation, management and analytics)
Beyond Private Clouds: How Hosted Services Offer the Best of Both Worlds
As organizations move toward implementing more cloud solutions, they face some important decisions. Data privacy and retention, regulatory and compliance considerations, and security all need to be considered, and may drive the decision to the choice of a private cloud. But then cost, a very important factor and in many cases is the key motivator for considering cloud in the first place, will come into play. Unless the organization has no issue with making the capital investment required to build their own private clouds, they will look seriously at alternatives where they won’t own or manage the infrastructure on which their clouds are built – that is, hosted cloud services.
This of course leaves them with the following alternatives:
1. Work with a provider who can manage a private cloud that resides within the customer’s data center, built on assets owned by the provider
2. Choose a public cloud that charges for services delivered based on usage, allocates resources on demand, and supports a service level agreement (SLA) consistent with the needs of the customer
So how does one decide? Because an organization’s data and content are critical to deriving insight used to make sound business decisions, they are strategic assets to be seen and used only by those authorized to do so. For this reason, any cloud approach must ensure adequate data protection. At the same time, access to data and content, while carefully controlled, must be unfettered to those who need to find, use, and share it. Then effective collaboration and innovation can take place. Finally, the cloud must support the policies and processes needed to meet whatever compliance and regulatory requirements that apply to the data involved.
It goes without saying that performance and reliability are also important considerations. In the case of private clouds, the robustness of the hardware, software, and networking elements that make up the cloud environment are fundamental to determining if these criteria are met. This is true for public clouds as well, of course, although in this case they are reflected at a higher level in the commitment the provider makes through its SLA.
One could argue that there is something approximating a tradeoff between reaping the cost and resource benefits of the cloud and meeting security, access control, performance, and reliability requirements. The more that is owned and/or managed by a provider, the more control one could perceive they are relinquishing. That introduces risk – or, at the very least, the perception of risk.
The solution to this, of course, is to work with a provider that can assume the role of a “trusted advisor.” One that not only possesses the experience and knowledge to build, host, and manage quality cloud solutions, but one that has demonstrated the ability to deliver them securely. And one that is willing to back its performance and security claims with a strong SLA. The foundation of whatever they host has to be a set of proven platforms and technologies that can meet these demands.
In other words –put your strategic data in the hands of a provider who can prove they are worthy of it.
The Challenge of Large Data
A few weeks ago I was in London for internal planning meetings and had the opportunity to catch up with some old friends in the industry, Chris Evans and Martin Glassborow. Chris and Martin have a way of keeping me grounded in the harsh realities of enterprise IT and this visit was no different. Martin had recently blogged about “Petascale Archive” and the challenges he and other IT pros are facing managing the scale of massive data growth. As an industry, we love seizing the opportunity new technologies will bring in the future while most businesses are focusing on their functional needs and costs today. In this reality gap, managing data growth is the “elephant” in data centers.
At the heart of this growing problem are data retention policies, which haven’t evolved much over the last 10 years. While we have more dynamic backup and archive (or data management) software, the cost of storage hasn’t dropped as quickly as growing consumption. Chris mentioned it wasn’t uncommon for his clients to have 60 to 100% annual data growth which is fueling concerns about the cost of data retention. Martin’s post specifically speaks to the total cost of a petabyte-sized tape archive. While the upfront cost of tape media is attractive, the realistic cost and time of maintaining a tape archive when you consider media migration and management is daunting. From Martin’s post:
It currently takes 88 days to migrate a petabyte of data from LTO5-to-LTO6; this assumes 24×7, no drive issues, no media issues and a pair of drives to migrate the data. You will also be loading about 500 tapes and unloading about 500 tapes. You can cut this time by putting in more drives but your costs will soon start escalate as SAN ports, servers and periphery infrastructure mounts up.
Further complicating this equation for CIOs is the promise of big data analytics and the challenge of extracting business value out of this retained data. As a standalone archive, tape doesn’t inherently lend itself to frequent access and data mining.
This challenge is larger than the medium used to store data. Today, data retention policies are often developed around retention costs and legal compliance and less on strategic business value. Copy management of backup and disaster recovery data also continues to compound the problem as does growing machine-to-machine (M2M) data. For many companies, petabyte-sized data retention is becoming the norm, not the exception.
Hitachi is no stranger to this challenge. As a $96 Billion dollar global conglomerate with business units that include Information and Telecommunication Systems, Electronic Systems and Equipment, Social Infrastructure and Industrial Systems, Automotive Systems, Construction Machinery and Financial Services…this challenge is personal.
All of this points to the need for data retention architectures which span hardware, software and people.
Metadata and Object Storage
As more data is generated, cost effective, accessible long term data retention is going to be dependent on the optimization of storage resources and good utilization of storage metadata. Greg Schulz summed this up recently in FedTech:
For some applications, an important attribute of storage solutions and services are their metadata capabilities. This includes the ability to support flexible and user-defined metadata.
Another enabling capability is policy management, which can use metadata for implementing or driving functions such as how long to retain data, when and how to securely dispose of it, and where to keep it (along with application-related information). This adds some flexible structure to unstructured data without the limits or constraints associated with structured data management.
This is where object storage can play a significant role. By leveraging application metadata that describes the content and its data retention policies, it can enable automation of those policies, freeing up storage and human resources, as well as managing the problem of unnecessary duplication of data.
While the concept of storage tiering has been around awhile, I’m surprised how infrequently it is used in many data centers. If implemented properly, storage tiering provides dramatic optimization of high performance (and more expensive) types of storage while intelligently leveraging lower cost storage based on performance, capacity and reliability characteristics. Both structured and unstructured data can be highly optimized in most environments. Database applications, in particular, can benefit significantly from tiering. Frequently referenced data, such as control tables and indexes can reside on the highest performance storage tier while less frequently accessed reference pages can be moved to a lower tier of storage. I believe we’ll see a much broader adoption of automatic (or dynamic) tiering as budgets force IT administrators to gain better storage utilization rates.
Primary Storage Deduplication
Data deduplication has been broadly adopted for backup data, but adoption of primary storage deduplication has not. One of the biggest barriers to primary storage deduplication is the very real concern of performance degradation. Like so many storage technologies, primary storage deduplication has evolved to have little or no impact to performance. Like tiering, deduplication processes can be automated. Administrators no longer need to experiment with scheduling and tuning deduplication processes.
These aren’t the only technologies which can be used to optimize storage utilization, but they should be among the options you consider as you look at managing your growing data requirements. The challenge of large data is very real and data retention policies will need to evolve as well. Intelligent data management, which includes developing compliant data expiration practices, will also need to develop as a matter of discipline.
Intelligent Platforms: The Cornerstone of Next Generation IT Innovation
As we listen to our customers who live and breathe in the data center, there is a growing need for intelligent platforms, a platform that:
- can expand or shrink with the demand fluctuation,
- can heal on its own,
- can be programmed,
- is available 24/7 anywhere, any device
You’re probably seeing an overarching theme across HDS around enabling our next generation platforms to intelligently react with information in the platforms. As our CEO Jack puts it we Innovate with Information. Software Defined Data Center, Cloud Infrastructure, Cloud Management are some of the current aspects of the IT industry that help make these intelligent platforms real.
Hitachi Data Systems has always innovated and contributed to society in a sustainable way. Our recent launch of Hitachi UCP solutions is a great example of this effort. The work done with Hitachi UCP Pro for VMware vSphere not only allows our customers to leverage their investments in VMware infrastructures, but also arms them with REST API to programmatically talk to UCP. Look for a subsequent post from me on this topic.
As we innovate and enable customers to be ready for their future needs, we value the role our technology and channel partners play and we continue to work with them as we add more intelligence into our platforms. A great example for this is the recent launch of UCP Select with Cisco UCS servers, bringing Cisco’s server technology to our customers and empower our channel to add additional value.
From an IT platform point of view there are clearly some major movements happening in the industry– such as OpenStack, flash technologies, unified storage, unified fabric, and big data, and HDS will continue to make contributions in all of those areas. I have the privilege of shaping some of these efforts by playing the new role as CTO of Intelligent Platforms. I am honored and humbled by this new role and I hope to use this blog as a forum to engage with you all and to bring some technology insights into various efforts as we evolve. Speaking of OpenStack – I’ll also be writing about some of the near term activities we are doing in this area. Stay tuned and until then, here is another perspective on the Intelligent Platforms from our good friend Krishnan Subramanian.
Hitachi UCP Pro for VMware vSphere – Extending Customer’s Value in Converged Infrastructure
With the release of Hitachi Unified Compute Platform (UCP) Pro for VMware vSphere, we introduced a number of firsts for converged infrastructure solutions
- The industry’s first converged platform offering 100% parity across RESTful API, CLI and GUI
- A truly unified, “single pane of glass” end-to-end infrastructure orchestration solution with a low
learning curve, due to our unique UCP Director software, which integrates directly into VMware
- The only converged infrastructure that can leverage a customer’s existing storage, either by connecting to a customer’s existing Hitachi Virtual Storage Platform (VSP); or, by virtualizing a customer’s third party storage arrays using Hitachi VSP (more than 100 third party storage arrays from different vendors have already been certified – see supported storage here)
Hitachi UCP Pro for VMware vSphere was designed and developed from the ground up as a highly integrated solution with industry-leading elements. Unlike competing converged solutions, UCP Pro for VMware vSphere offers customers:
- Delivery in weeks not months.
- Lower OPEX.
- Future-proofing investment
- End to end infrastructure orchestration
And things are only getting better. With the recent release of an updated version of UCP Pro for VMware vSphere, we’ve responded to customer asks with several key features:We are expanding our storage to include lower cost and capacity entry points with Hitachi Unified Storage (HUS) 150 and HUS VM storage systems. HUS 150 provides industry-leading price / performance and high density per rack for maximum space efficiency, while HUS VM provides high density, with our industry-leading storage virtualization capabilities. Customers can also leverage existing VSPs to easily incorporate this solution into their data center and again, lower the price to entry.UCP Director now offers deeper integration with the vCenter Web Client. Of particular interest to customers with cross-platform support, vCenter Web Client can run on any standard web browser, and still provide the features and functionality of our “thick” server-based version. It even extends additional features such as inventory lists, related objects and portlets, offering a more comprehensive view of the data center to the vCenter administrator. Our web client is perfectly aligned with VMware’s strategy of developing web-based client versions for all their major applications and management tools.The latest version of UCP Pro for VMware vSphere expands on the strengths of the previous version, offering customers even greater flexibility, cost savings and price / performance from their converged infrastructure solutions.Interested in learning more? Check out these quick videos to see how easily you can provision and manage your servers, storage and network in your VMware environment.
- How to provision a new ESXi server with 1 click using Hitachi UCP Pro for VMware vSphere
- How to provision storage to a VMware host using Hitachi UCP Pro for VMware vSphere
- Expanding the size of a VMware data store using Hitachi UCP Pro for VMware vSphere
- How to configure VLANs using Hitachi Unified Compute Platform Pro for VMware vSphere
#CloudAnywhere Changes the Equation
In my recent posts, I’ve mentioned the vast amount of unnecessarily redundant content in today’s IT environments. This time, I’d like to dig a little deeper on this aspect by considering what can happen with just one file in a typical IT environment and contrast that with #CloudAnywhere.
Let’s start simply with a document saved on a user’s laptop (1 copy). Then the backup application makes a copy (1 copy). Let’s assume that this is then emailed to 5 reviewers. While the mail server will likely single-instance the attachment (1 copy), the 5 reviewers save local copies to their laptops (5 copies). We’ll give the benefit of the doubt and assume that the laptop backup app can single instance the backup copies. Still, we’re already up to 8 copies.
Now a few days go by and mailboxes are filling up, so the 5 reviewers move the document to their local email archives. Since the creator hasn’t archived it, it’s still in the mail system and 5 more copies have been created. Then the backup application copies the reviewers’ archive files which cannot be single instanced. Now we have another 10 copies, taking us to 18 total and neither the creator nor the reviewers have sent it to others (whose devices also get backed up), posted it to websites (which also get backed up), loaded it into content management systems (also backed up). We also haven’t touched on the replication and network traffic.
Imagine this scenario playing out across hundreds or thousands of users. Now imagine that the file isn’t some 10K spreadsheet or 100K document, but a 10MB presentation or a 100MB video. This is the kind of inefficiency that is running rampant in traditional IT environments.
#CloudAnywhere radically changes the equation by sharing access to files instead of sending copies in the thousands. Because all files are stored in a common, backup-free repository there can be just one well protected and properly replicated copy of a file that is available to all the right people, wherever they may be, with whatever device they have. Should users or applications need a local copy, it can be pinned to a user device or a file server and when that level of access is no longer required the file returns to being just a link to the latest version in the ‘content core,’ Hitachi Content Platform at the corporate data center serving as the centerpoint for all corporate data and content. More very soon…
Here’s the Beef
Having grown up in the 80s I recall fondly Wendy’s famous commercials with the charismatic woman pondering “where’s the beef”. In the included video she ends with, “I don’t think there’s anybody there. I really don’t.” The trip down memory lane got me thinking about some things that are going on in the industry this week. Some are hidden, but meaty. Others well, there’s a lot of bun in the message and it makes me want to proclaim from a rooftop, “WHERE’S THE BEEF? HANG ON, IS ANYBODY THERE?”
In the spirit of exposing hidden but meaty things, I do want to reference two visionary gems.
- HDS joins PARC in the Emerging Networks Consortium – Being as this year is the 40th anniversary of the Ethernet I’m particularly proud of this effort. The guys at PARC are beyond smart and have had a tremendous impact on our industry. Again we are working now to make sure that in the years ahead our customers are going to get the best and we are well poised to meet key partners, like Brocade and Cisco, to realize a future with clear benefits.
- Hitachi MRI Scanners Ranked #1 for 3rd Consecutive Year – While at-a-glance this may not be relevant to HDS customers today, a hidden fact with stellar MRI capabilities is that for Hitachi to do well here we have to be super smart about image processing. That is because sharper images save lives, and this creates a pocket of capability we can leverage through open innovation with our customers.
These visionary beefy nuggets aside, I think that there is another big bun that Hitachi’s “Wagyu beef burger” more than covers: pervasive data center-wide software defined storage services. Truthfully, our Hitachi Data Ingestor (HDI) virtualizes our killer private cloud storage offering (Hitachi Content Platform, HCP) and addresses Hitachi, EMC, NetApp or DAS capacity because it can be delivered as a virtual machine appliance. Further, using our UCP Director software our users can orchestrate Hitachi’s current and future (that’s a really big hint)cutting-edge software defined offerings on award-winning Hitachi infrastructure.
Speaking of Hitachi infrastructure, this is another area we’ve been innovating in for years. We innovated separation of the control and data planes within the block storage infrastructure layer and Hitachi has subsequently cascaded this approach in our Hitachi NAS Platform (HNAS) and HCP offerings. Moreover, a deeper look at our Hitachi Virtual Storage Platform (VSP) and its little brother Hitachi Unified Storage VM (HUS-VM) reveals a Hardware Abstraction Layer (HAL) ensuring a high degree of flexibility in our choice of physical infrastructure — hmm…I wonder where that is going.
So this week if you’re spellbound by a ringleader’s sizzling showmanship while he announces the latest hijinks of the Bourne clown troupe masquerading as a fire brigade to “save the storage world as we know it,” pause and ask “where’s the beef?” If you ask this question, and I believe you will, I have an answer for you and it is coming later this month.
The New Math of Magic Quadrants
My colleague Nick Winkworth published a great blog on the new Gartner Group “Magic Quadrant for Blade Servers” and how it positions HDS and our blade server offerings. It’s a worthy read.
Given that I commented on the Magic Quadrant for General Purpose Disk Arrays with a blog, I thought it best to give the server business some equal attention. When the storage-focused and blade-focused Magic Quadrants are reviewed, you’ll see that there is a lot of commonality in how positively Hitachi is positioned. It’s great to see our strategy becoming understood by the analyst community.
There is one main difference between the Magic Quadrants, and it’s one that multi-tasking readers rush to – the picture. Of course in this case the picture is the “Quadrant” that highlights the placement of each vendor. In the blade server space, HDS is shown as being in the Niche category, while in the general purpose disk arrays we are positioned as a Leader.
So the question is, in today’s world does the old adage still hold true? Does a picture still equate to a thousand words?
As Nick pointed out, the HDS position in the “Niche” quadrant is actually quite logical given Gartner’s definitions for placement. We are successfully helping our enterprise customers, but are not one of the “big guy” blade server vendors that is covering every possible use case across the globe; our approach is far more targeted.
Since Gartner has continued to move the position of Hitachi upward (better Execution) and rightward (better Vision) with each of the last three Magic Quadrants, I take that as a clear sign that they see that our strategy of focusing on Hitachi Unified Compute Platform and enterprise-level compute innovation as one that is working for us and our customers.
For a bit of “Niche” category perspective, we are positioned similarly to Oracle with respect to our “Ability to Execute” and well ahead of them in terms of “Completeness of Vision.”
So while the positioning makes sense, the “thousand words” that the Quadrant picture represents doesn’t feel as illustrative as the 327 words that Gartner writes about HDS and our server capability.
Of the noteworthy comments, Gartner states that “Hitachi innovates strongly around blade aggregation and highly integrated virtualization” and “…Hitachi’s blade servers are highly popular among the vendor’s installed base…”
Gartner goes on to mention how their client feedback on Hitachi Unified Compute Platform, our primary go-to-market with our Compute Blades, is “very positive.” They also mention that our Compute Blade hardware is “a well-proven platform, with a strong Japanese installed base.”
To be fair, Gartner also mentions “Cautions” about a few HDS server areas, as they do with all vendors. For HDS, Gartner’s commentary focused on a storage-oriented sales and marketing focus, that we are “relatively unknown” as a server vendor outside of Japan, and a “limited” channel presence for servers. Relative to the largest blade server vendors, those seem fair.
Of course, none of the “Caution” points suggest we cannot help our customers, as demonstrated by our continuous growth, both in the market and as represented in the Gartner “picture” of the world.
Is it possible then for 327 be greater than 1,000? I’ve never been a mathematician, but to me Gartner’s 327 words are far more valuable to understanding how HDS can help you with your converged infrastructure and computing needs than the thousand words presented by that picture.
Hopefully everyone will read them.
Because, while the math doesn’t seem to work, we continue to get customer feedback on our converged infrastructure offerings that “the sum is far greater than the parts,” so maybe it’s a comeback of the “new math” we heard so much about a while ago.
When “Niche” Is Best
This week Gartner announced its latest “Magic Quadrant” report for blade servers. Once again Hitachi Blade serversare included in the report, and once again Hitachi’s position on the chart has moved upwards and to the right; improving in both “completeness of vision” and “ability to execute”.
The report consists of the chart itself, and also some commentary indicating pros and cons (or “strengths” and “cautions” as Gartner calls them) for each vendor. The commentary for Hitachi, which was excellent in past years, is even better this year, calling out our strong innovation in blade technologies and noting our products’ high rating and popularity among our customers . The report even clearly explains our strategy for addressing the market through our Hitachi Unified Compute Platform (UCP) solutions in combination with our storage technologies. In fact the only “cautions” boil down to the simple fact that – relative to the big server vendors - Hitachi remains little known in the broader blade market with a small market share outside Japan.
Despite this ringing endorsement of our products and strategy, the question I am most frequently asked is , “if your servers are so good, why isn’t Hitachi in the leaders quadrant?”.
The problem with this question is the underlying assumption that “leaders” (at least, as defined by Gartner) are always “best”.
To qualify for inclusion in the “leader” category a vendor must address the entire blade market, including the high volume and commodity space, a market which as we know from rumors of IBM’s impending sale of this business to Lenovo, is not always a desirable one. “Leaders” must invest heavily in R&D, but that investment is spread over a large number of projects and different target markets.
On the other hand, “Marketing 101” tells us that the vendor who understands the needs of its customers most completely and creates products to meet those needs , is not only the one who will succeed, but also the one who will have the most satisfied customers.
Hitachi and HDS strongly embrace that approach, focusing completely on the needs of large enterprise customers. These customers turn to us because they require high availability and high performance coupled with advanced integration with storage, networking and enterprise software such as SAP, Oracle and VMware to reduce data center operational cost and complexity.
If that is a “niche”, well, if you represent a large enterprise with critical computing needs, “niche” may just be the best choice for you.
Hitachi Data Systems Highlights @NABshow 2013
As the final production truck has packed up and pulled away from the Las Vegas Convention Center declaring the end of the 2013 NAB conference, we look back at the new products and technologies announced at the show. One thing that became very apparent was that if you can’t relate to the following 5 topics, you probably weren’t at the right conference:
- Cloud services
- Storage inside the workflow or machine
- Better camera capture
- Brilliant imagery on and in everything
Yes, that means you’d better get integrated with open APIs and into accelerating workflow applications. I would like to briefly focus on two areas:
Storage inside the workflow or machine: Expect to see end-to-end media workflows with emphasis on higher quality and reduced costs, while producing brilliant colors without shutter jitter. Below is the end-to-end workflow diagram that we showed visitors in our booth.
4K: Another theme we saw at the show related to 4K. As the world moves to 4K and uncompressed digital content, capacities skyrocket and vendors are scrambling to meet the growing demand for increased capacities.
Below are links to products and technologies we checked out, outside of the Hitachi booth.
NAB is a chance to quickly assess vendor directions, broadcaster investments, technology trade-offs and value propositions. We hope you had a chance to come by our booth. If not, come see HDS later this year at , Sept 13-18.
Cloud Anywhere Does More
In my last post, I talked about how the traditional means of storing, protecting and sharing files are breaking down and hinted at a new approach from Hitachi Data Systems. For now, I’ll refer to this as #CloudAnywhere, where your own cloud makes data available anywhere it is needed without the need for complicated, cumbersome and costly IT resources at each and every site and for each and every user.. It’s a world where you no longer maintain hundreds of copies of the same file across a myriad of file servers, user devices, archives, backups and content management systems.
#CloudAnywhere is ideal for organizations looking to archive data, backup less data without sacrificing protection, consolidate more data into a smaller footprint, distribute consistent content, perform e-discovery and compliance actions and facilitate the use of the vast ecosystem of applications written for the cloud.
#CloudAnywhere is secure. It provides cloud services capabilities from within your own IT department, avoiding the limitations and risks of typical consumer clouds and turning over control, security and protection to others. This allows you to retain proper stewardship and governance of data and reduce the risk of noncompliance or compromising intellectual property.
#CloudAnywhere is simple. It is compatible with your existing practices and policies, and supports traditional network storage as well as cloud protocols. It helps improve productivity and responsiveness to changing requirements. You can avoid worrying about local storage at every site, growing backup and restore times, expanding mailbox sizes, struggling with file size limits in content management systems, over-replicating data or even physically shipping tapes, DVDs or USB drives by sharing a single, well-managed central repository instead of multiple, sprawling silos of content. .
#CloudAnywhere is smart. Data is stored in Hitachi Content Platform (HCP) and does not need additional backup. Since only the favorite files are stored locally and the rest shared as a URL, there are far fewer files moving through email, the network, and user devices; and with compression and single-instancing in a high-density storage footprint, the total capacity required to support all that data is reduced.
#CloudAnywhere does more than just solve the challenges customers face today; it helps them prepare for what’s coming. Trends like big data, BYOD, next-generation file services, private and hybrid clouds, metadata management, and analytics may not be budgeted projects today. However, the very attributes that make our solution so well-suited to cloud, distributed IT, and metadata-driven automation prepare IT for what’s next.
Big Data Just Got Bigger
Well, it’s been quite a while since the health and life sciences (HLS) team issued a blog. Things have been quite busy as our business continues to grow at unprecedented rates, far beyond the market. While we’ve been busy driving our business, the growth of big data in health and life sciences industries has continued to grow as well.
Healthcare providers need to manage their data and can realize certain benefits if the data is properly analyzed and managed. For the HLS market, big data could be defined as “data utilized to process meaningful results from across the patient care spectrum”. When we talk about big data we tend to think of the 3 Vs as defined by Gartner analyst Doug Laney- Volume, Velocity and Variety. Healthcare is no different. However we have come to realize that big data in healthcare has gotten even bigger. Not 3 Vs but 5 Vs. Just when you thought you had it figured out big data grew. Let’s look at where this growth has come.
Volume of patients as baby boomers enter the age of requiring health services. Volume of digital data being created as governments invest in technologies. Volume of research as drug companies and genomic companies generate larger volumes of data – all of this becomes a challenge for organizations to manage.
With more systems generating data, more information requiring analysis for decision making, and more compute power required to sequence genomes, velocity becomes a more prominent issue to be dealt with. Capturing data from sources that previously went unnoticed – heart rate monitors, insulin pumps, morphine pumps – all unstructured data sources, means more information that holds the promise of valuable decision making. Add social media to the list of inputs and velocity becomes a blur.
No where is there more variety than in healthcare. We have hundreds of standards and nomenclatures, thousands of applications generating data in different formats and a combination of structured and unstructured data that all relate to the patient in some way. With a total lack of integration between these systems there is even more variety between vendors in how information is captured and presented to physicians. Social media events are now adding to the variety of data that pertains to the patient’s health.
The 4th V – Value
Each object that is created has a value to some stakeholder. Each of those objects have multiple stakeholders that place a value on the object for some length of time. For example, a blood glucose monitor for diabetes has a short term value to the patient – do I need to take my insulin or not. It has a midterm value to the treating physician – how stable has your blood sugar been over time. It has a financial value to the manufacturer – how many times per day does a patient utilize the blood glucose monitor, thus calculating cost vs. profit margins (and other financial questions beyond me). It has a value to a researcher who is monitoring the effectiveness of a particular brand of insulin, and so on. When put into this context the value of the data combined with the 3 Vs begins to place more meaning on the data that was somewhat abstract before. It could be assumed that value was always part of a big data strategy, but I think it’s worth calling out separately as many people struggle to understand how big data impacts them directly. Value starts to add some clarity.
The 5th V -Validity
The 5th (though possibly not the final) V is validity. Validity defines a very important aspect of big data and that is accuracy and completeness of the data. Many customers have raised the issue that the data they analyze may not be clean – they have to trust that the data they get has been validated. In fact, many of the companies that perform analytics assume the data they get is accurate and complete. A discussion with Gartner and HDS revealed that this is in fact not the case. In many cases the data in databases is incomplete, inaccurate or invalid. Data must be cleansed prior to any meaningful analysis being performed and so without consideration for validity of the data, big data is just a big mess. Validity also has a role in how valid the results are to a user. Does a CT of the lower extremity of a diabetic patient have any validity to the researcher looking for circulatory problems associated with a particular pharmaceutical? Was the CT done for diabetes circulation or was it done for trauma reasons? The answer affects the outcome of the questions asked by the researcher.
Taken on its own, none of these Vs constitute big data. It is only when combined that one can consider big data analysis and how to truly deal with the management of information we call big data.
ISO/IEC and ITU-T Converge on Cloud Computing Terminology
by Eric Hibbard on Apr 17, 2013
So you may be asking yourself, what’s the big deal about cloud computing terminology? And how can something as simple as a term or a definition be so important? Simply put, when standards development organizations (SDOs) like the International Organization for Standardization (ISO), the International Electrotechnical Commission (IEC), or the International Telecommunication Union (ITU) wade in, the resulting definitions can actually help determine which Information and Communications Technology offerings are really cloud computing. When these three SDOs choose to collaborate on a single cloud computing terminology standard, the stakes are even higher and there is little or no ambiguity because the industry is dealing with one standard rather than multiple standards (or at least that’s the idea).
The good news is that ISO/IEC JTC 1/SC 38 (Distributed application platforms and services or DAPS) and ITU-T/Stud Group 13 (Future networks) have in fact formed two collaborative teams on Cloud Computing (CT-CC): 1) Overview and Vocabulary (CT-CCV) and 2) Reference Architecture (CT-CCRA). The challenge has been to get these very different SDOs to come to an agreement.
After months of vigorous debate, the CT-CCV and CT-CCRA settled on terminology compromises last week (April 9-11, 2013) in Madrid. In a nutshell, they were able to define core terminology like cloud computing, public cloud, private cloud, etc. and to address what the National Institute of Standards and Technology (NIST) calls cloud service models.
As defined by NIST in Special Publication 800-145, there are only three cloud service models: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). However, the NIST position hasn’t set well with organizations like the ITU-T, with its Network as a Service (NaaS) and Communications as a Service (CaaS), or to a lesser degree, the Storage Networking Industry Association (SNIA) with its Data Storage as a Service (DSaaS), which is defined in ISO/IEC 17826:2012, Cloud Data Management Interface (CDMI) . Compounding the complexity, the cloud marketing around IaaS, PaaS, and SaaS has made it nearly impossible to abandon these terms even though precise definitions have been hard to come by.
At the Madrid meeting of the CT-CCV, the term cloud computing was finely defined as: “paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with on-demand self-service provisioning and administration.” So there you have it, we now know what cloud computing is for the moment, subject to reconsideration between now and the next CT-CCV meeting in Kobe, Japan in September, 2013.
But wait, there’s more. To deal with the sticky issue of cloud service models, the CT-CCV decided to introduce a new approach by first defining cloud capabilities type as, “classification of the functionality, based on resources used, provided by a cloud service to the cloud service customer.” Following the principle of separation of concerns (i.e. minimal functionality overlap), the CT-CCV further reasoned that there are only three different cloud capabilities types:
- Application Capabilities Type–cloud capabilities type in which the cloud service customer can use the cloud service provider’s applications
- Platform Capabilities Type–cloud capabilities type in which the cloud service customer can deploy, manage and run customer-created or customer-acquired applications using programming language specific execution environment supported by the cloud service provider
- Infrastructure Capabilities Type–cloud capabilities type in which the cloud service customer can provision and use processing, storage and networking resources so that they are able to deploy and run arbitrary software
One might make the observation that these cloud capability types roughly correspond to the original SaaS, PaaS, and IaaS definition from NIST.So why go to all the hassle of defining these cloud capability types? The answer is that cloud services can be grouped together into cloud services categories (defined as, “group of cloud services that possess some qualities in common with each other”), which can include capabilities from one or more cloud capabilities types. The breakthrough is that by using this approach, it is now possible to describe all the known (and yet to be defined) *aaS as cloud service categories.To help organizations understand this new approach, the next draft of the single text standard, ISO/IEC 17788 | X.CCDEF, includes the following most common cloud service categories:
- IaaS: cloud service category in which the cloud capabilities type (3.2.4) provided to the cloud service customer is an infrastructure capabilities type
- PaaS: cloud service category in which the cloud capabilities type provided to the cloud service customer is a platform capabilities type
- SaaS: cloud service category in which the cloud capabilities type provided to the cloud service customer is an application capabilities type
- NaaS: cloud service category in which the capability provided to the cloud service customer is transport connectivity and related network capabilities
- CaaS: cloud service category in which the capability provided to the cloud service customer is real time communication and collaboration
- DSaaS: cloud service category in which the capability provided to the cloud service customer is the provision and use of data storage and related capabilities
- Compute as a Service: (CompaaS): cloud service category in which the capabilities provided to the cloud service customer are the provision and use of processing resources needed to deploy and run arbitrary software
An informative annex also includes Table 1, which shows the relationship of the seven cloud service categories and three cloud capabilities types. An “X” at the intersection of a row and column depicts that the cloud service category, shown as a row in Table 1, is of the indicated cloud capabilities type, shown as a column in Table 1.
Table 1 - Cloud Service Categories and Cloud Capabilities
As further explanation of Table 1, a cloud service category that uses processing, storage or network resources gets an “X” in the Infrastructure column. A cloud service category may offer the capability of deploying and managing applications running on a language-specific execution environment supported by the cloud service provider, in which case it would get an “X” in the Platform column. Similarly, a cloud service category may offer the use of an application offered by the cloud service provider, in which case it will have an “X” in the Application column. Note that a cloud service category could offer any combination of the three cloud capability types.
An unfortunate side effect of this new approach is that the marketers of the world will most likely set about making up all kinds of new *aaS labels. The term Washing Machine as a Service was muttered more than once in the Madrid meeting…more to avoid goring anyone’s favorite ox, but now I’m not so sure.
So what do you think of the grand compromise?
In Hu Yoshida’s recent post, Primary Storage Deduplication Without Compromise, he did a wonderful job highlighting the capabilities of the dedupe implementation on our Hitachi NAS Platform (HNAS) 3080 & 3090. Hu also begins to touch on why we are able to deliver deduplication without compromise: our Hitachi NAS architecture. In particular, Hu stresses that the reason for our capabilities results from our File Offload Engine (FOE) coupled to standard Intel microprocessors.
My colleague Matt O’Keefe (you can read his blogs here and here) burrows past the end of the rabbit hole, describing more detail on the HNAS architecture itself. For referential purposes I’ve lifted an architectural diagram from his second post. My hope is that you can study the architecture and begin to understand the depth of competitive differentiation we have here. In past posts I’ve tried to make our architecture more reachable, while showing off a bit of my inner techno-philosopher. At the end of the day the intent with all of these posts is to expose not just a differentiator for one specific product, but a key design center for Hitachi IT platforms. Notably our key design center is the separation of the data plane from the control plane.
I think that Hu did a great job in outlining the several features available for dedupe: FPGA (Field-programmable Gate Array) accelerated hashing/chunking, smart automation to throttle up and throttle back, etc. What I’d like to do is explain two facets: reads with zero rehydration penalty and how we handle the dedupe database. These two things are only possible because of our differentiated architecture.
Zero Rehydration Reads
Due to the nature of the HNAS (and HUS File Module) object-based file system we are able to extend capabilities over time. One such extended capability is our SHA (Secure Hash Algorithm), which is computed by our FPGA offload engines residing on the data plane. This results in super minimized overhead for the overall deduplication process, and it also allows for a more conservative intelligent background process that eliminates redundant data only when load on the system is minimal. At a high level the capacity reclamation process:
- Iterates across a checkpoint in the file system looking for changed chunks within an object
- Gets the SHA(s) from the File Offload Engine (a.k.a. the data plane)
- Asks the hash database if the chunk is a duplicate
- If the chunk is a duplicate request the File Offload Engine to reclaim the duplicate and update a pointer within an object pointing to the root/master chunk
Further objects within our file system include a “map” for all of the chunks that it owns, and the directions on the “map” can point to either unique chunks or the same chunks shared by other objects. Visually this would look like:
Therefore, when the read case occurs for either file_a or file_b, post dedupe, in the above example, either o-node will happily traverse the directions supplied by the “map,” capture all of the chunks, and return the complete file to the read requestor. At no time does the system consult with the dedupe database to facilitate a read request. So in this way the system pays no rehydration tax for a read request. Finally, any file stored within a deduped file system can take advantage of both the dedupe and rehydration free reads – while this may seem obvious there is a reason why I’m stating it.
Deduping the Dedupe Database?
The last sentence in my previous paragraph is very relevant to this section, and again I know it was totally obvious. The database we used is effectively a memory resident Key Value Store (KVS) running in the control plane, which is merely a hash database. Even though it is resident in memory for performance reasons in the control plane it does “page to disk” to preserve state. Where do you think that disk might be? I hope that the intuitive answer is that the best place to “page to disk” is in the HNAS file system residing on the data plane. Because we are storing the database on the HNAS file system, which is dedupe enabled, we can actually dedupe the dedupe database. I think that this is pretty awesome as it speaks further to the care we take to provide as much usable capacity as possible when creating any storage feature.
Yet again Hitachi is leveraging our unique design centers for IT architectures resulting in a feature that bests the competition. In this case we are offering a novel deduplication approach for primary storage without compromise. Or said another way, and to quote someone I just talked to this week, we do no harm with our implementation. This is an important factor for us because many of our competitors surely offer dedupe but there is enough fine print to rival a tax code or two. Whereas our aim during this tax season, if you’re in the U.S. anyway, is to give you a “capacity refund” on existing assets.
Rapid Deployment of Microsoft Applications with Microsoft Private Cloud
Almost three years ago, Hitachi Data Systems was one of the first companies to announce solutions to support Microsoft’s Private Cloud Fast Track program. Our customers had been looking for cost-effective private cloud infrastructures that were both easy to deploy and easy to maintain, and that’s what we delivered. Since then, we’ve seen strong adoption of our Microsoft Private Cloud Fast Track solutions in major vertical markets, such as telecommunications, web hosting providers, service providers and Fortune 1000 customers in retail and finance. A great example of this is a regional telecom customer who rolled out our Hitachi Unified Compute Platform (UCP) Select for Microsoft Private Cloud solution in stages, supporting ITaaS for internal use initially, followed by affiliates, then finally by enterprise customers. This telecom customer expects to leverage our infrastructure for the next five to 10 years – making that quite the return on investment. What’s more, our converged infrastructure is practically bullet-proof, with 99.999% reliability – stronger than any public cloud offering in their region.
Today we’re releasing our Microsoft Private Cloud version 3 solution, which promises to make a great solution even better. This new version supports full provisioning of compute and storage resources, allowing customers to quickly deploy virtual machines using System Center 2012 SP1. We’ve developed PowerShell commandlets for Virtual Machine Managers and runbooks for Orchestrator that deliver the full benefit of Windows Server 2012 and Hyper-V feature support, allowing customers to deploy and manage thousands of virtual machines.
Finally, we are making rapid deployment of a Microsoft Private Cloud solution easier than ever, by taking care of all the heavy lifting by pre-validating Microsoft’s applications. This assures that customers can move forward confidently with Hitachi UCP Select for Microsoft Private Cloud.
Check out the new reference architecture for Hitachi UCP Select for Microsoft Private Cloud, which uses the latest in Hitachi storage and servers, coupled with networking, Microsoft Windows Server 2012, and Microsoft Hyper-V and System Center 2012 SP1.
The Storage Olympics Gets Magical
To those in the storage world who rejoice in being in-the-know about the ever shifting technology and vendor landscape in front of them, Gartner Magic Quadrants are seen as major events in the “Vendor Olympics” that our industry can often devolve into. Now, by combining multiple disparate storage-related Magic Quadrants into one review of General-Purpose Disk Arrays (made publically available by HDS for you, here) it seems Gartner has created the decathlon of the storage Vendor Olympics.
Midrange, High-End, NAS, Monolithic? Yup, the gang’s all here.
And while you might be a fan or a detractor of Gartner’s methodologies, it does measure two vectors of actual importance to storage customers: a vendor’s ability to execute and its completeness of vision. (Or, in my own plain English – “Can they do what they say?” and “Can they correctly anticipate customer and market needs?”) While my perspective comes squarely on the vendor side, those do seem like pretty appropriate areas to focus on.
Given the new, broader focus of this Magic Quadrant and significant industry chatter that can follow any new Gartner commentary, it seemed relevant to add some thoughts and perspective about it.
First and foremost, the overall positioning of the vendors “feels” about right. There are the expected “Leaders” (with Hitachi/HDS among them) that have built, bought or partnered their way to the top of the pack. While individual positions could be argued, I doubt there were that many surprises in the “Leaders” quadrant.
While it’s best not to fixate on the position of every “dot” and its exact position, the attention there is almost inevitable. I could offer an argument that the unique and growing collaboration between HDS and our parent company, Hitachi, Ltd., offers us a unique edge in the world of information clouds and the machine-to-machine big data of tomorrow that is coming… but overall, being positioned as a clear leader seems to represent our position pretty well.
Note that the two competitors who were positioned ahead of us are storage pure-plays – vendors who cannot offer converged solutions based on their own compute and storage technology. HDS, on the other hand, provides our Unified Compute Platform family with both our own Hitachi-developed servers as well as those from our partner Cisco. So, having Gartner judge our storage offerings as ahead of all the server vendors and within striking distance of the pure plays is an enviable position to be in.
I’ll steer clear of calling out specific commentary about the competition, which would likely devolve into a rather unproductive event in the Vendor Olympics. Instead, I’ll offer some quick thoughts regarding Gartner’s commentary about Hitachi. You’ll note that structurally, after a brief overview of each vendor, Gartner calls out “Strengths” and “Cautions” for all vendors, which in our case seemed to neatly align around three of our product families of Hitachi Virtual Storage Platform (VSP), Hitachi Unified Storage (HUS) and Hitachi NAS Platform (HNAS). (Our entry-enterprise, unified storage system Hitachi Unified Storage VM was too new to be included in this Magic Quadrant.)
Relative to our high-end VSP, Gartner notes how it is “distinguished” due to performance and capacity scalability, proven data protection and replication capabilities, and its “widely used” virtualization function. Sounds about right. It “Cautions” that VSP will be due for a refresh “…within the next six to 12 months…”, apparently drawing on historical industry norms of high-end storage platforms getting refreshed every 3.5 years or so.
I’m not going to be breaking any news about our future high-end roadmap here, but I’m also not that surprised that Gartner’s only question is about the future, and not what customers are buying today. Our continued high-end growth and recent 5-category sweep of Storage Magazine’s 2013 Quality Awards demonstrate that our users are quite pleased with the Hitachi Data Systems technology they are offered today.
In fact, there’s been a tremendous response to how we’ve extended the value of VSP by introducing Flash Acceleration software (press release, here) and our unique Hitachi Accelerated Flash storage hardware (press release, here.) Those announcements were significant, as we not only introduced a unique and specially engineered flash storage option for improved cost, density and durability, but we also fundamentally upgraded our system code to maximize performance when deployed with flash storage capacity. I wonder how many of the other Magic Quadrant leaders have done that?
As for what comes next for VSP and high-end storage from HDS, I think it’s fair to say that our continued hardware and software excellence will only be expanded in terms of the performance, scalability and functionality our customers have come to expect. We’ll continue to deepen and expand upon our leading storage virtualization capabilities in ways that will provide the efficient, flexible and always-on storage pools demanded by next-generation data centers. I’d love to say more, but you’ll need to wait a bit longer for any sneak peeks.
Switching gears, what I love most about Gartner’s commentary about the HUS platform is that it focuses on a fundamental strength – the HUS symmetric active/active controller architecture. This architecture isn’t commonplace in the industry, as the document clearly highlights. This means that the balanced and scalable performance we offer cannot simply be matched by adding some processor megahertz to a lesser architecture. When customers realize an HUS system can automatically load-balance over its block storage controllers and remove the headaches of manual LUN reassignment, conversations quickly turn away from specs and toward how we solve their challenges.
This is also why Gartner’s “Cautions” for the HUS are tough to address, because they focus on how our file and block processing is run by discrete components. This is true, as within our HUS systems our active/active symmetric block storage controllers work with our FPGA-based file modules (which directly correlate to our popular HNAS gateways) to provide access to a common storage pool. And in reality, separate file and block processing is more often the rule than the exception in unified storage systems today.
Customer interest in unified storage has centered on being able to provision from a consolidated, well-utilized pool of storage, and manage file and block functions from a single toolset, and less about the controller integration specifics. For many file and block management functions, we’ve already delivered that unified experience within Hitachi Command Suite, answering a large part of the unified storage promise.
So as long as our discrete components continue to deliver leadership capabilities like our scalable performance and automatic load balancing, 128TB volume sizes, 8PB single namespace sizes, policy-based tiering and replication, application and hypervisor integration and screaming performance (including this new Storage Performance Council example), I’m just not sure how large an issue this really is for most customers.
Lastly, Gartner talks about Hitachi NAS Platform (HNAS). (Note: HNAS technology is also the basis of our file storage capability within our HUS 100 family and HUS VM – so the commentary applies to both.) The “Strengths” point to familiar Hitachi attributes of performance and scalability, while describing how HNAS can be a fit for big data environments and the consolidation “of multiple NAS filers.” The “Cautions” call out a lack of deduplication that “inhibits HNAS competitiveness” in certain applications.
HDS agrees that deduplication is an important requirement for today’s efficiency focused IT customer. In fact, HDS has been shipping a new version of our HNAS software to customers for more than two months, with a leadership-level primary storage deduplication functionality at the core of its new capabilities.
No, this dedupe is not beta. It’s not a controlled release. It’s generally available with real customers and in real production deployments. Our experience with those customers is confirming what we internally expected: we have a winner on our hands.
HNAS deduplication removes many of the normal compromises of primary storage deduplication systems by providing all the expected efficiency improvements without sacrificing file sharing performance and scalability. We can accomplish this by leveraging our high-performance FPGA-based hardware architecture and enabling data deduplication to be an automated process that does not interfere with file sharing workloads. What you end up with is a primary storage deduplication system with less administration, auto throttling intelligence and up to 90% storage reclamation.
While our dedupe capability might not have been shipping before Gartner’s cut-off date for the Magic Quadrant, it’s out in the market, available now and, if I may say so, pretty exciting. You can expect a blog post soon from my colleague Hu Yoshida expanding on the technical details of our dedupe engine.
So while I may have joined into the Vendor Olympics that sometimes surround the publishing of a new Magic Quadrant, I’ll say this… it does feel nice for the qualities to be recognized and for us to be on that storage industry medal podium.
And rest assured, we’ll be paying close attention to what our customers need today and where they are headed so we can keep developing the best solutions for tomorrow’s data centers, because we don’t plan on stepping down off that podium any time soon.
The Rumble, Part Deux – The Storage Hypervisor
At the last SNW in San Jose we had the first panel discussing and debating the Storage Hypervisor and Storage Virtualization. Moderating this ragtag panel to ensure it didn’t turn into a cage fight, was the pleasant Mark Peters from Enterprise Strategy Group. The “fighters” were Ron Riffe (IBM), George Teixeira (DataCore), Mark Davis (Virsto), and of course myself, otherwise I wouldn’t be writing this blog.
Well, the team is back, and we’ve added Momchil Michailov (Sanbolic). From what I hear from their trainers, everyone is in great shape and ready to get back into the ring again. Let’s see who the last man standing will be. Tuesday, April 2nd at 5:00PM. Come get a ringside seat!
File Sync and Share
Traditional means of sharing files are breaking down. Email attachments, content management systems, copies on user devices, in backups, and on file servers all lead to inefficient storage and network utilization owing to massive content duplication and high cost for storage, backup, and data management.
But it’s not just file sharing – the rise of BYOD means users want their work data on multiple devices and getting work data onto a smart phone or tablet would require a user to use file-sharing techniques to get that data where they want it. And this just makes matters worse.
The limitations of the old methods have led to the popularity of consumer cloud-based file sync and share tools like DropBox and Box. How popular are they? IDC estimates the market to be $20 billion by 2015. Venture capitalist investments are funding Dropbox (valued at $4billion), Box (valued at $1.2 billion) and new entrants to the space all the time.
These trends are causing problems for IT as users are generating and sharing more and more copies of data. This exacerbates storage and network inefficiencies and stores them in unsanctioned devices, applications and clouds, putting the data outside the control and governance of corporate IT.
The answer is not ruthless enforcement of strict policies, as users will just find another workaround. The answer is also not to simply give up and turn data over to consumer clouds. The true solution is to deliver file synchronization and sharing from within IT to enable users to access data and collaborate on any device, from any location at any time; doing it safely, securely, and with corporate oversight using a private object storage-based cloud.
However, most of the offerings in the market today rely on public, consumer clouds for their storage environment. While some enterprise vendors have gone this route by acquiring Syncplicity and ionGrid, Hitachi Data Systems is taking a unique, differentiated approach to tackling this challenge with something more homegrown.
Stay tuned over the coming weeks for more details on how Hitachi Data Systems helps enterprise IT shops deploy safe, secure file sync and share from their own private cloud that can store data from cloud-based applications on-premises and under the control, protection and security of their own IT staff. In the meantime, I’ll leave you with this…
Do We Really Need All of the Copies We Have?
Data Protection is important. Anyone disagree with that? I doubt it, but maybe data protection in practice is broken.
On our last blog we promised to talk about the different technologies that contribute to data protection, and how each of these technologies has its own place. We have snapshots (both “Copy on Write” (COW) and “Copy after Write” (CAW)). We have cloning technologies where the entire LUN is copied. And for the traditionalists, we have simple backups to tape, VTL, or disk. Add to all of this we have various forms of remote replication (and of course best practices here require a clone or two at the secondary site). Do you want fries with that?
So let’s explore some of the technologies a little deeper and how they relate to the real goal: that is, getting the data back in the time you need (RTO) and with data loss that is acceptable to your business (RPO). They relate to each other.
Let’s look at a scenario with traditional backup either to tape or disk. Typically, these are taken once a day using a backup application. If something happens to that data 15 hours later, and you restore the backup copy (which takes x amount of time), that data is now 15 hours old. But generally you can’t afford that amount of data loss, so to get the data, you’ll either need to apply logs (if this was a database application), manually reenter the data, otherwise recreate the data from other sources, or just accept 15 hours of data loss. Guess what? That takes time, so now you have elongated your RTO to something that may be unacceptable, and therefore, for this particular application that technology may not work.
Maybe the solution is to use a different technology: Continuous Data Protection (CDP) perhaps– snapshots every 15 minutes that will give you better data currency and therefore should allow you to have a lower RTO. Why not use those technologies for everything? Well it comes down to the cost, and whether you can justify the cost.
n his most recent blog post, David Merrill walked though a process to calculate the cost of the many different data protection options. It’s definitely a good read, since there is no single solution that applies to all applications, hence, the “plethora” of solutions that we posted in our previous blog entry.
Awhile back the three of us (Ros, Claus and David) were discussing data protection economics with Dave Russell of Gartner, and the subject of the “number of copies” came up. Dave (Gartner) claimed that 12-15 copies of reference data is the average. Not to be outdone, David (our David) claimed the number was closer to 10-13. Is this round-off error, or what?
We have hypervisor admins taking copies of their systems and keeping multiple versions. We have DBAs doing the same to their databases, and we have the traditional storage admins also taking backups, and perhaps replicating the data to a second and even third site.
Do we really need all of these copies? This gets back to David’s most recent blog where he quotes some IDC numbers that demonstrate the cost of multiple replicas.
There are two issues here. One is cost, the other is recovery. The age old joke is that with all of these copies taken, can they ever be recovered? And can the data be recovered in time? It depends on selecting the right technology. There is no single answer.
Setting the Foundation for Cloud
You’ve heard a lot about the cloud and its advantages. But why all this hype? After all, keeping your data in house, under control, and safe is a high priority for your business. So why take the risk of storing it somewhere else? It’s a great question.
One of the major drivers businesses are experiencing is the accelerating growth of data and content, much of it unstructured. These assets are valuable to the business–once analyzed, they can reveal intelligence and when applied to business decisions they can make a big difference to the bottom line. But volumes can go beyond the capacity of existing storage infrastructures, posing a challenge to IT managers and their budgets. The capital expenditures needed can “break the bank,” causing organizations to look for less expensive (and capital-intensive) ways to meet growing infrastructure requirements. Even if additional storage capacity was economically viable, it would have to be sized to accommodate maximum usage regardless of how much is used at any given time.
Cloud infrastructure is one way to mitigate the capacity and cost issues associated with the growth of unstructured data and content. Choosing a cloud infrastructure solution can shrink or eliminate incremental capital expenditures, and because the cloud supports a “pay-per-use” model, even operational costs can be significantly reduced. But questions still remain: Is it worth relinquishing control of my data? What service level can I count on? Will I have timely access to all my data when I need it? What if I don’t? Can I confidently and consistently meet my compliance needs?
For some customers, security and compliance concerns may prompt the decision to choose a private cloud. This choice works for many, since it allows them to maintain control over their data while building their infrastructure on a virtualized, more optimized platform and to implement a “pay-per-use” model for their internal stakeholders. They can build their own, but a more cost-effective alternative might be to work with a storage information solutions provider who can build and manage their private cloud for them. In this way, both CAPEX and OPEX costs can be reduced, and IT resources can be deployed to meet more strategic needs (such as creating new customer-facing applications).
A private cloud can therefore be a logical first step, or can be a permanent approach to dealing with your growing infrastructure needs. Regardless of the cloud delivery model you choose, or mix of models if you eventually end up with both private and public clouds, it is critical that you give high priority to choosing a cloud service provider that builds its infrastructure and services on platforms that can deliver on required service levels, and on technologies that can provide quick, dependable access to all data when and where it is needed is critical.
As with any IT-related consideration, understanding your environment and needs up front, and making the right vendor and technology choices will be important success factors as you progress on your cloud journey.
Data Protection: A Plethora of Alternatives, but Which Ones are Right for You?
Ros Schulman, Data Protection product line manager at HDS, and I, in conjunction with David Merrill, are exploring the economics of data protection in a series of blogs. David will focus on economics while we focus on the technology and practices.
“It depends”–an age old answer, and there are no black and white rules, but data protection is like short-term insurance – the more precious the data asset and the higher the risk, the higher the premium paid to ensure uninterrupted access to that data.
Let’s begin by looking at some of the many options out there and try and to break it down. Why do we protect data? Well that’s simple. In case something happens, we need to recover that data. Wouldn’t you be upset if you lost your phone and didn’t have a backup of your contacts? We also need to keep data for regulatory reasons, but either way it’s the recovery process and how long it takes that counts.
So what are our choices? We have traditional backups, remote copies, backups on disk, snapshots, VTLs, copies here, copies there, copies everywhere, but sometimes not when you need them or not in a form from which you can recover quickly. One of the things to consider is the difference between a backup copy and something that is actively being copied to. A backup copy is typically made at a specific point in time, traditionally to tape, nowadays often to disk. The benefit of these backups is if the original data gets corrupted or deleted, you can restore it from the copy. In an active copy scenario, often used for disaster recovery, the copy is also subject to the same issues as the original. So do you need both? Well here it comes again. It depends on your requirements for recovery. If you have plenty of time to recover and don’t require data to be current after recovery, then using more traditional backup methods for both local backup and disaster recovery may be an option. If you need your systems to be up in a very short period of time, and the data to be current, then you will need active copies for disaster recovery and something like snapshots versus traditional backup for local recovery.
One thing to remember with an active copy is that if the primary data gets corrupted, the second copy will also be corrupted immediately afterwards. That’s why active copy for disaster recover should never preclude traditional backups. A few years ago, a team of us visited a large bank (at the time they were not an HDS customer; they are now). Their problem was that they were mandated (by senior management) to suspend nightly backup activity as a cost savings measure since they were already doing remote replication. And as you would expect, their primary database got corrupted, as did the second copy. Immediately!
So just like insurance it comes down to cost versus risk. To help determine the right mix of data protection solutions, the business needs to ask:
- How critical are those systems?
- What is the cost of recovering them in an hour versus the cost of recovery in 24 hours?
- Is the cost of recovery more than the cost of the system going down?
Next in the series, we will further discuss some of the different technologies. And we’re sure “prolific blogger-Merrill” will be chiming in as well.
Big Data and Many Things Trying to Get Along
The n-body problem is the problem of predicting the motion of a group of celestial objects that interact with each other gravitationally.
The result of the n-body problem, beyond 3 interacting bodies of gravity on each other, becomes chaos and is “nearly impossible” to accurately predict with simulations especially if the bodies are different in composition. While this is understandably an exaggerated stretch of an analogy, big data (at least the way I adjust everyone’s definition) is similarly an n-body problem, but with data.
There are many industry and vendor definitions as to what big data is. Ultimately, there is some truth in just about all of them. My colleague Michael Hay posts the HDS definition of “Big Data of the Future” in a recent blog. What I like to add to these definitions and what I like to describe in any of my conversations with people on big data is that big data is also the interacting complex relationship between different types of data to form a single thread of information. Huh? That’s right, stick this in your definition, and you’ll sound like me.
This phrasing applies to both the different data itself and to the different sources of data whether it is retrieved from a persistent store, inflight from sensors, from social networks and so forth. Each of these data sources is comprised and/or derived from potentially complex systems or possibly from combinations of dissimilar data sources. Big data at one end can force simpler atomic datum with complex relationships together all the way to using the results of a constitution of complex systems mashing data together, yielding outcomes that are again used as input to another complex process. Obviously, the difference is that we want orderly and predictable results sooner than what the n-body problem might suggest. Maybe in big data research, this should be called the n-data problem, but with unpredictable and useful results.
Think about it, information innovation has evolved from a single database source of data, to a wide variety of data sources at different velocities. This data can be from different eras, all trying to interact with each other, force attractions, and produce complex associations to derive meaning and ultimately, produce a result that is unexpected yet useful. This combination of different types of data, in some cases seemingly unrelated types of data, is the foundation of science fiction and marketing commercials.
This concept hit home with me recently when talking to a European energy customer . As you know, one of the hats I wear is in the high performance computing space, or high performance “anything” space. The conversation turned to seismic data processing using HPC systems. Did you know that in most cases, oil companies aren’t always “looking” for oil in historic seismic data? They already know that there’s energy there. What they are now analyzing is whether or not it is “economically justifiable” to extract this oil or gas. I now use “economically justifiable” as an over-weighted term. This means that at the time of the survey, oil was maybe $10 a barrel, but the amount and the conditions surrounding this discovery made it too expensive to extract. This could include, quantity (not enough details available to determine the amount of oil), environment (the oil is under a nature preserve or city), and situation (the oil is too deep or the ground is too hard to drill, and so forth).
One of the reasons certain customers keep data forever, especially in the oil and gas industry, is that the analysis processes and tools continue to get better over time. This can be through better and faster hardware, new software improvements, new mathematic algorithms or improved times to analyze data. Historically, this has been the case and this is how I once described this industry’s use of their seismic data archives and HPC systems: the continuous cycle of applying new tools to old data, looking under every shale rock for oil or gas by using methods of increasing data resolution through computing of historic data.
However, the oil and gas industry is probably one of the most successful users of the big data concept (with my definition addendum) in the world today. Analysis now includes more than just brute force processing of seismic data. They combine current situational data to the results of a seismic run. New drilling techniques, hydraulic fracturing “fracking,” horizontal drilling, new refinery processes, regulatory policies and taxes, climate conditions, social media sentiment analysis, national, political and monetary policies, and other parameters all combine into their big data analytics. Each of the data sources is in itself a complex system yielding results as a piece of this process.
With oil hovering today at $100 a barrel and gas pump prices threatening $5.00 a gallon, there may now be “economical justification” to extract previously economically ignored, environmentally or ecologically undesirable energy sources. In fact if you look closely, the world is experiencing a recent energy resurgence in new fields.
When I state that the n-body problem is “nearly impossible” to predict, at least accurately, what I mean is that the result, no matter what it is, is useful. Similarly with big data, the complex relationships between different data types and the correlated orchestration of combining this n-data problem may not result in a predictable outcome, why would you want that? What you should be looking for is something unpredictably useful.
It’s a “three-peat!”
By David Karas
HDS is once again proud to be acknowledged as one of the World’s Most Ethical Companies by the Ethisphere Institute. This is the third year in a row that HDS has received this honor – making it a three-peat. The term “three-peat” was coined and subsequently registered by U.S. National Basketball Association coach Pat Riley in 1988 for use on t-shirts, jerseys and hats in anticipation of his team, the Los Angeles Lakers, winning their third consecutive NBA championship. His team didn’t have the chance to use the term as they lost the championship that year, but Michael Jordan and the Chicago Bulls used it in 1993 when they won their third NBA championship in a row. But three-peat goes beyond a mere marketing scheme and the word carries a much more significant meaning. Teams that have been able to three-peat are extremely rare. To win once takes ability, to win twice takes persistence, but to win three times shows mastery and consistent performance. It is the breakpoint where you quiet the doubters, dispose of the words “fluke” and “luck” and prove yourself to be a true champion firmly devoted to being the best. At HDS, “doing the right thing” and ethical behavior is ingrained in our culture and is part of everything we do. We know this, our customers know this and our employees know this. It makes us one of the best places to work and one of the best companies to do business with. Recognition has never been the goal, but a three-peat sure feels good.
Congratulations and thanks to all HDS colleagues for the commitment and dedication it takes to make this recognition possible.
Hitachi Machine Data in Action: Open Pit Data Mining
Sara Gardner, Senior Director SW Product Marketing, Hitachi Data Systems in conversation with Martin Politick Director, Research and Development at Wenco International Mining Systems
The Internet of Things is the future of big data!
Millions of machines all connected to the Internet, collaborating, driving intelligent automation and delivering new insight that can have a dramatic impact on a company’s bottom line – that’s the promise of the Internet of Things.
The application opportunities are vast and Hitachi is right at the heart of the revolution. We are building innovative solutions for machine data with many of these scenarios going way beyond the traditional data center. And it doesn’t get more rugged than the mining industry
No, “Open Pit Data Mining” is not a new exotic branch of analytics. It’s about using machine data to drive new efficiencies in the mining industry. I recently chatted with Martin Politick, Director of R&D at Wenco International Mining Systems Limited to dig deeper.
Getting More Out of Your Mine with Wenco
Wenco International Mining Systems Ltd is a subsidiary of Hitachi Construction Machinery. Hitachi Construction Machinery makes mining
equipment. Think really big machines – mining excavators, hydraulic shovels, backhoes and dump trucks.
Wenco delivers GPS-based open pit mining information technology including solutions for fleet control and management, mine visualization, equipment tracking and maintenance. Martin shared with me a couple of the innovative ways they are using sensor data and analytics to transform open pit mining.
Optimizing Mining Operation
Open pit mining requires complex coordination and management of typically hundreds of pieces of heavy equipment including trucks to move and dispatch ore, drilling machines, and processing and dumping equipment. This is a high stakes game and there are many variables involved to ensure the mine equipment and human resources are operating at maximum efficiency and safety. For example, what are the best routes between loading and dumping stations? How are those routes impacted by traffic congestion? What are the utilization rates of the crushers and shovels?
Wenco is leveraging GPS-based sensors to track location and utilization of the mining equipment and fleet. The sensor data is rolled up into daily monitoring reports and leveraged by optimization applications that utilize complex analytics and algorithms to guide machine operators to optimal routes and stations.
Unearthing Significant Operational Savings and Production Gains
I asked Martin just how much impact a mine can expect to see from utilizing technology like this and the answer is ‘huge’ – both in cost savings and overall production and therefore revenue gains. He shared with me the example of the Assarel Medet copper mine which not only realized significant savings in excavation, transportation and crushing operational costs but managed to increase the amount of mined material by 16% to reach an all-time high.
On Demand Maintenance
Wenco is also leveraging sensor data for more proactive, preventative maintenance of equipment. Machine failure is expensive. Timely maintenance can mean the difference between minor downtime and hundreds of thousands of dollars in lost production and equipment replacement. Wenco is working with Hitachi to stream real-time machine health data and metrics to the cloud for fast interception of potential problems before they result in costly failures. For example, a sticking valve and an exhaust temperature drop could be an early signal of pending engine failure.
Each individual sensor has the potential to stream back GBs of data so data volumes quickly mount up. Data scientists at Hitachi have developed algorithms to effectively mine (no pun intended!) this data by capturing and comparing patterns in the data that will enable intelligent prediction of machine failures before they happen. This analysis will also enable automatic cause of action directives to be sent back live to the mining operators. They are piloting this solution on big excavation equipment today.
The Future – automated mining
Operation optimization and preventative maintenance are just the tip of the iceberg when it comes to the potential applications of machine data in the mining industry. Mining sites are often located in geographically remote locations. It is often challenging to get enough trained operators and many of the operations are inherently dangerous. The cost to build a suitable infrastructure is quite high. With the work already done in sensor-based monitoring and guidance Wenco is building the foundation for more automation in mining operations for the future.
The Internet of Things is the future of Big Data!
The Internet of Things takes data capture out of the data center and down to the front line. Mining is just one of many examples of how machine data can transform industries and I look forward to sharing other examples of how Hitachi and customers are innovating with information in the coming months.
Pivotal Years in Storage and Dumb Answers
A while back I was being interviewed, and the reporter asked a question that stumped me. But it shouldn’t have.
The question was: “So, you’ve been in storage for a long time (please spare me the cryogenics jokes). Was there any single event, breakthrough, or defining moment that changed storage forever?” I was stumped and gave a pretty dumb answer, so dumb I’ve purged it from my brain. You know, a Men In Black moment.
However, once I started thinking about it, there certainly was a defining moment, and now when I’m talking “storage economics” I’ve added that message into the conversation since it’s quite relevant.
As we all know, my colleague David Merrill is the storage economics guru and a prolific blogger (I would never pretend to replace him nor his work). However, I also talk a lot on the subject, having covered for him at conferences, and we continue to work together on projects. I talk more about the technology that contributes to storage economics and David talks more about the economics of storage.
Here is what my answer should have been: Yes, there were two pivotal moments in storage. The first was in 1956 when the first disk drive was ever shipped. I’ve shown this picture before, but it’s a classic reminder of the roots of our $120B industry.
The second moment, much more obscure but very important, came in April 1992 and for two reasons: First, storage transitioned from a commodity “dumb box” to a value-add product. Prior to this, vendors were simply cramming a bunch of 14-inch disks into a box and selling them. The primary vendors back then (IBM, HDS, EMC, and Amdahl) would try to compete on performance and reliability, but all our customers would hear is cost per MB. Price was the only metric, and at $10 a MB, that was not surprising. This was a commodity at its finest. We didn’t even have RAID then.
The first in a long line of these value-add capabilities was concurrent copy, a function designed to dramatically reduce backup windows using a technology we today call “copy on write” (yes, it’s still around today).
Second, and much more relevant, we changed the pricing of storage by requiring software licenses (for microcode, I never liked calling microcode software) and professional services engagements were required. You could see heads spin when our customers had to “buy” microcode and accommodate strangers into their datacenters to install these things.
Things got out of hand very quickly since this was followed by networked storage (with switches, links, ports, etc.) more licenses for microcode, difficult-to-understand maintenance and leasing arrangements.
To me, this was the genesis of storage economics and this weird storage world we all live in. Want to know how much your storage costs? Good luck, unless you’re using the storage economics discipline.
If the reporter is out there that remembers asking me the question, you now have my answer.
It Feels Good.
HDS Recognized for its Culture of Leadership and One of America’s Best 100 Places to Work
By Asim Zaheer, senior vice president, worldwide marketing
It’s not often you get to work for and be a part of a great company whose values continue to earn global recognition. That’s why I am so excited and proud that HDS has won two of the most prestigious awards in the business community this week:
• FORTUNE’s “Best 100 Companies to Work For in 2013” for the second year, moving up 23 spots in the process to #63;
• Chief Executive’s“40 Best Companies for Leaders;” HDS ranks No. 7 under the continued leadership and innovation of CEO Jack Domme, moving into the top 10 from the No. 16 position in 2012 and No. 36 in 2011.
These accolades don’t just happen by chance. It takes a lot of hard work to create and build a corporate culture that lives up to these honors. And, it all starts at the top. Jack believes in empowering people and rewarding performance. He also believes in doing meaningful work that will make a positive impact on the community. In the end, it’s about raising the bar at HDS and holding one’s self to a higher standard.
One of the ways we are doing that is with a continued focus on innovation. Innovation has been an essential ingredient for positive change, for achieving tremendous financial success in the past year, for empowering our employees to do great work, and for helping our customers compete in challenging times.
Throughout all of this, information has been our fuel. We are constantly striving to produce new products and solutions that help our customers manage their growing mounds of information. We are also working hard at opening our lines of communication and being more transparent. This commitment to our customers and to our employees is part of building a winning team that people want to be part of.
To offer more perspective on the tenets of our success and winning culture, I want to share the spotlight with someone who has been a major champion of our success – Nancy Long, executive vice president and chief human resources officer at HDS. Following is a brief excerpt from our conversation:
We are hitting our stride cementing our position with consecutive placements on these very prestigious lists. What, in your opinion, makes HDS a great place to work?
When I came to HDS about seven years ago, one of the first and most prominent things I noticed about our culture was something that is very rare in today’s business environment. There was a sense of family and loyalty. Not a blind loyalty, but a sense of alignment and commitment toward achieving a bigger goal for all of us.
In both good times and challenging times, it is important for a global organization such as ours to effectively communicate a clear vision and values, which is what inspires people. This feeds our sense of pride and camaraderie. We are an organization that is willing to do what it takes. We hunker down together. We celebrate together. This culture of commitment and loyalty is what makes HDS the company it is today, and in my opinion, has been a secret until now.
How do you think HDS arrived at this approach as a company? How did we get there?
You noted it starts at the top with Jack and it does. Beyond Jack, our entire executive leadership team is committed to transparency. This involves constant communication to our employees about what we are doing as a company, and why we are doing it. Our words have to align with our actions, and that needs to stay consistent. And, we have built trust because we do what we say; that creates a winning team. Furthermore, we will change and grow as the industry and our customer needs do, and we will always turn to our employees to be part of that change. That in itself is pretty empowering, don’t you think?
How has HDS maintained this winning corporate culture as a subsidiary of our parent, Hitachi, Ltd.?
We are all very proud to be part of Hitachi. If you think about it, our founding principles from 1910 haven’t changed much. These principles, which have coined the Hitachi Spirit, focus on empowering people and treating them with respect. We are also a company that is mindful of our commitment to the community. This is in our DNA. And, as we work to deliver on social innovation with Hitachi, we will leverage each other’s strengths to help us move forward and make an impact in a host of industries that positively touch people’s lives. The power of that is huge. Our employees are very engaged and excited to be a part of this, and it’s a big differentiator for us.
HDS is now ranked among brands like Google, Wegmans Food Markets, Dreamworks and Zappos. What are some commonalities we share with these organizations?
To be a great organization, your employees have to have pride and feel as though they have the opportunity to succeed. All companies do things that are unique to their culture, whether that be creating a unique workplace environment or inspiring employees to get involved.
Our employees are very involved and many lead charitable causes of all kinds. One example in Santa Clara that makes us unique and employees enjoy is the Dog Days of Summer. We have a pet fair, raffles, vendors who specialize in animal health and wellness and employees bring their dogs to work for the day. There’s even a talent show. All proceeds from the event are donated to local animal charities and it is always a big success.
While we are a sizeable organization, we are still small and nimble enough to allow our employees to bring new and creative ideas to the table, and we help and support their efforts to bring these initiatives forward.
How do honors like these help HDS achieve its business goals?
Innovation is about bringing a new idea to life. Customers look to us for these great ideas and they want to work with a company that not only has good solutions, but also, a company that treats its employees well. Our people enjoy what they do and this feeds our success.
From a talent perspective, we also rely on the power of a reputational brand. And, these awards are helping us to cultivate and recruit strong talent across the globe. We want to attract, develop and retain the best people, and we are doing that.
You know, Jack always says, “Stay humble and hungry to avoid both.” To me, receiving these honors shows how deeply committed we all are to doing everything we can to ensure the employment experience people have at HDS is an excellent one. We spend so much time in the work environment, we as leaders should be good stewards of our employees’ time and provide them with the most rewarding experience we can.
What’s on the horizon to continue to keep HDS on the cutting edge as a great place to work in 2013?
Just because we have been recognized on these lists doesn’t mean we are finished. We are all living in a challenging global economic environment, and I would argue that communication and innovation will be even more vital this year than ever before.
In HR, we are working with leadership to ensure we have the organizational capability to continue to win in the market and be a great place to win. We are constantly looking for new ways to push the envelope to do new and different things that make our employees say “wow.”Our employees have made us say “wow” with this great win, and I couldn’t be more proud to be here.
Indeed, many of us feel the same way. Thank you to all of the employees that make HDS not only a great place to work, but also a place where we all have opportunities to grow and share in our successes together. It feels good.