Nathan Moffitt

Quality of Service

Blog Post created by Nathan Moffitt on Nov 4, 2016

Are You Getting All the Quality You Deserve?

 

Quality of service (QoS) on storage systems is a hot topic these days, driven heavily by competing IT demands:

 

    • Consolidate workloads onto a small set of systems to maximum ROI
    • Deliver consistent, high speed performance for applications

 

As system administrators and IT managers know, satisfying both these requirements can be challenging. Especially as workload data levels grow and usage patterns change.

 

That’s where QoS comes in to play. QoS software manages the way resources are allocated so workloads / applications perform in a predictable fashion. Most vendors start and end QoS conversations with performance limits, but there is much more to it than that and if you stop your QoS discussion there, you may getting low quality, QoS.

 

But don't take my word for it. Let’s hear what senior product guru Anahad Dhillon has to say.

 

QoS Basics: Stop Me If You’ve Heard This Story Before

 

Anahad, what is the most common definition of QoS you see?

[Anahad] Storage vendors that have QoS – which isn’t everyone yet, even though some claim they have it – describe QoS as the ability to throttle applications so that they do not consume all of a storage controller’s bandwidth.

 

And how is that usually configured?

[Anahad] Implementations vary but basic implementations have QoS configurable by port, WWN or LUN. Limits are then set with a maximum IOPS or MBps. At least that is what I see in customer RFPs. This allows administrators to prevent less important workloads from ‘hogging’ bandwidth.

 

You said ‘basic implementations.’ Why basic?

 

[Anahad] Capping workloads is great for preventing what are called ‘noisy neighbors,’ but capping workloads creates artificial limitations on bandwidth usage. Ultimately that results in waste because system resources aren't being used. A good QoS implementation includes thresholds for activating QoS. This allows all system bandwidth to be used freely when critical workloads are idle, maximizing ROI.

 

 

Are many IT buyers aware their QoS implementation could be wasting resources?

[Anahad] No. More than half of all customers I speak with think they have a good QoS strategy until we talk about how bandwidth is used during idle periods. Then they realize that another vendor’s QoS implementation may actually hurt them.

 

 

 

The Other Shoe Drops: Workloads Need Data Reduction Resources Too!

 

Is it fair to say that this is where most customers think the QoS conversation ends?

[Anahad] Absolutely. When I have storage design conversations, this is where the customer think QoS functionality ends. I have to pause and let them know there is another aspect of performance management the customer needs to consider, data reduction.

 

Data reduction? Who cares about that (said sarcastically)?

[Anahad] Only IT teams with budgets.

 

That isn’t many.

[Anahad] Not many. Just, you know, all of them.

 

So why is data reduction important to QoS?

[Anahad] Data reduction functions like deduplication and compression require system resources to function. The more IO passing through a controller, the more resources required to perform data reduction. When you set up QoS, you have to consider that every IO is reducing BOTH total available IO and system resources needed to process the IO.

 

Is it possible that a storage array could run out of resources before QoS limits are reached?

[Anahad] Yes. This is because it is hard to qualify the impact of individual IOs. Read IOs will have less overhead than writes. Writing data patterns that have not been seen before will require more processing than data patterns that have been seen before. As a result, even capped workloads can consume an unfair amount of storage resources and slow down critical workloads.

 

So what now? Turn off QoS? Throw it away?

[Anahad] For 90% of vendors the answer is probably yes. If the array can only do data reduction on the controller QoS may not meet your expectations. If you can offload data reduction from the controller to other devices, then you have another choice.

 

How?

[Anahad] Take HDS VSP F series. On VSP F series, SVOS can perform deduplication and compression, but it can also offload compression from the controller to Hitachi custom flash modules (FMDs). If you configure workloads to have dedicated FMDs the workload gets it's own engines for performing data reduction. That lowers latency and delivers more consistent performance. Plus it helps the entire system scale further so you don't have to buy more arrays. And since each FMD has its own quad-core processor for data management and inline compression, a few FMDs adds a lot of power. No other major array I know of can do that.

 

 

But to be fair, that does mean dedicating FMDs to workloads right?

[Anahad] Yes. But you don't have to do it for every workload. Just the critical ones. The secondary workloads can share a common storage pool. And even if you don't want to dedicate resources, the offload strategy alone adds huge value to make sure you get real quality of service.

 

How does having a big pool of FMDs help in general?

[Anahad] SVOS will distribute data across FMDs and offload compression to the FMDs. That minimizes the amount of work the storage controller has to do and enables consistent, low latency that systems with controller level data reduction will struggle to achieve. Especially as data levels grow. And since an F series can scale to 576 FMDs, that gives you up to 2304 cores for handling data reduction. How many vendor implementations do you know with over 2304 cores of processing power?

 

Uhm. None.

[Anahad] Exactly. None.

 

Closing Thoughts                             

 

The net here for IT leaders is that delivering consistent application performance isn’t as simple as throttling workloads or even adding a threshold value to make sure you aren’t artificially limiting workloads during quiet periods. You have to consider the impact of functions like data reduction too.

 

If your array doesn't have the ability to either allocate controller resources for individual workload data reduction or the ability to offload data reduction to other devices, QoS may not give you the quality you expect. Something to think about for sure.

Outcomes