Angela MaGill

Does network speed really matter?

Blog Post created by Angela MaGill on Apr 12, 2017

If you're like me, you've been pondering the true business value of 32Gb networking speed.  Is the investment worth it? Will it make a material difference for my business? With yesterday’s announcement of a 32Gb Fibre Channel module for Cisco’s MDS 9700 Directors, I decided to have a chat with Christian Dornacher, Hitachi Data Systems Director of Storage & Analytics Solutions in EMEA, to get a better sense of how networking speed impacts some of the key use cases we see with our customers today. 

 

 

AM: We see many customers undergoing consolidation efforts in their data centers.  In these new DC designs, there’s a need to consider the implications within the network when connecting larger numbers of servers and storage in order to ensure that the new design scales appropriately in order to perform properly.  How have you seen this play out?

 

CD: One of our customers experienced issues with the servers not being able to access the storage array after the completion of their storage consolidation effort.  The array was being blocked by aborted tickets which means that the data packets were timing out.  The storage array was receiving some of the data but the data was corrupted or incomplete. In this scenario, the array sends a message to the host asking for more data or to have the host re-send the data. If that request isn't satisfied within a certain timeframe, the array marks the data as invalid and aborts the transfer.  The same scenario can occur on the server side if the data being requested or written is read as incomplete or corrupt.  This situation can happen on both ends and the ISLs can become so overloaded that the abort messages don't reach the other end which can cause both sides to re-send the abort messages and cause the entire environment to crash.

 

AM: Ouch! What can be done to prevent this from happening?

 

CD: The situation was caused by the network design.  The backend or backbone was not designed to keep up with the number of servers now connecting to the storage array.  Most storage systems and servers are connected w/ 16Gb networking components today.  To keep up with the servers and to avoid over subscription, customers need to create a network design that utilized 32Gb speed on the backend. Having 32Gb speed on the backend helps provide headroom (or buffer capacity) needed to ensure that there is no blocking or congestion in the network. Fewer storage systems means those systems take more load from the servers which results in more traffic in the SAN.

 

AM: What lesson can we take away from this?

 

CD: As a best practice, users need to consider the network when putting together a storage consolidation design and understand the repercussions of not addressing the networking requirements.  It's important to ensure that the network can handle the increased workload. Latency is very low with flash storage which makes the speed and design of the network critical considerations for the consolidation architecture.

 

AM: Speaking of latency.. applications are no longer about TB per second.  They are all about user experience and user experience is all about response time.  Let’s talk about the potential impact of latency in a business continuity scenario.

 

CD: Flash storage solves this problem on the storage side and memory and cache solve it on the server side. If the network between the two is not fast enough, the application performance will not meet the needs of the users.

 

High speeds keep latency down when transferring  data over distance which makes the application response times better.  Having the headroom or buffer capacity is critical to ensure that there is no blocking or congestion in the network.

 

Hitachi Global-Active Device (or GAD) provides business continuity and clustered storage functionality and it hits network connectivity on the backend or network backbone.  The question is: what is the value of your data?  There may be ‘less costly’ methods to protect your data, but if your data has a high value to your business, access to the data in the event of an outage on the primary site, along with good performance and full transparency for the user applications, is key to the business. Therefore, the critical elements are  more about transparent and non-disruptive failover, and online availability of the environment.  

 

Because GAD is active / active, it allows users to access data from both sides of the cluster at the same time. Not every user will use GAD for this, but those that do will need that data to always be in synch and 32Gb network speeds help guarantee that.

 

AM: Another scenario that many of us are familiar with is cloud hosting and/or service provider.  As a hosting provider, you need to be able to run multiple users/customers on the same infrastructure to ensure that the business model is cost effective. What design considerations are important for this scenario?

 

 

CD: Again, the most critical component comes back to speed.  If you have 100 users accessing your hosting environment, it must be designed to accommodate many users coming in simultaneously which means response time is critical. 

 

For example, hosted environments for VDI require the speed and performance of the infrastructure be able to handle the morning workload with many users logging in between 8-9am requiring immediate access to their virtual machines, Outlook and MS Office files.  This scenario may only play out for a few hours each day but high speed connectivity is required to manage the workload.

 

In many managed service environments, the only SLA that matters is response time which means that response time defines the design.  If the response time SLA is violated, it influences the customer experience and the availability of critical business applications.   Failure or slow response time s can cause production problems or, in the worst case scenario, complete outages.  As a result, the contracts define the penalties that must be paid in the event of outages; with penalties typically measured in hundreds of thousands of USD per hour for a production environment outage, the cost to upgrade the network could be minuscule in comparison to possible penalties to be paid.

 

AM: We now know we need to consider the network when developing plans and designs for consolidation, business continuity and hosting deployments.  The repercussions are real and they are potentially very significant for businesses across the spectrum.

 

For more information on Cisco's announcement, register to attend this webinar on April 19th:  https://events-cisco.webex.com/events-cisco/lsr.php?RCID=b6f7967aaa7641c6a12460c3e3572da5

Outcomes