David Merrill

VM TCO Observations

Blog Post created by David Merrill on Sep 19, 2016

As I mentioned in my last blog, HDS had worked with a lot of clients in the last few years to help them identify and reduce the cost of Virtual Machines (VMs). Enough time and assessment 'aging' has been done to now be able to report on trends, macro-observations and observe cost ratios.

 

First, lets look at some aggregate rates of how VM costs tend to stack-up:

 

Untitled3.png

 

Now a couple of caveats from the pie chart above

  • There are 24 cost elements that we have in our VM TCO methodology. The above chart is an aggregation of the most common or popular cost areas
  • Size of VMs impact the TCO, so what you are seeing is the average cost distribution of the average VM in a client environment. Costs by size do differ (larger VM have more memory, storage so the HW depreciation expense will tend to be higher)
  • Geographic location of VMs also matter, so locations where power or labor rates are higher will skew 'average' results
  • VM costs are also related to the workload. Oracle VM will have a different cost profile compared to a VDI or test/dev VM
  • The age of the hardware has a big impact on total costs. Older systems, with a shrinking book value, will see inflated rates for maintenance, power and cooling (on a per VM basis)

 

With the caveats out of the way, lets talk about the cost distribution a little:

  1. Number one cost (in terms of % of TCO) tends to be labor. VM are still fairly labor-intensive related to
    • Troubleshooting
    • Standardization, catalogs, custom builds
    • Patch management
    • performance mgmt, backup, restore
    • configuration management
    • workload migration
  2. Second highest cost is DR and data protection (backup) of VM. This includes managing the cluster, snaps, replication and backup schemes
  3. Next is the software costs, that include the hypervisor, OS, management tools etc. Some software can be depreciated, others are licensed annual or with a usage utility.
  4. Hardware depreciation (separating storage, software and servers) is next if you count each separately. All combined, depreciation expense tends to be about 25-30% of VM TCO. As the assets age, and book values approach zero, then the depreciation costs will shift to maintenance costs

 

Breaking from the order of the top 4-5 cost areas, my observations on the rest

  • Maintenance costs (Hardware and software). Most vendors provide 3 years of HW maintenance with blades and storage. After year 3 the HW maintenance cost tends to quietly and quickly increase. Software maintenance or license fees tend to emerge in the 2nd year of ownership.
  • Provisioning time - this is the time a project has to wait for a VM to be presented. This may include purchase, engineering, asset allocation, config etc.
  • Environmentals - data center floor space, power and cooling for servers, storage and network equipment
  • Engineering time, to certify and test (non converged) VM hardware stacks
  • Network - both top of rack equipment, as well as WAN, SAN, IP network for local and remote systems
  • Risk - usually related to schedule or un-scheduled outages

 

Honorable mention, even if not included in the graph

  • Cost of waste - when we can run a tool to see the CPU and memory utilization, it is not uncommon to find some 10-15% of total VM that are dead (not active). These are wasting hardware, licensing and maintenance resource dollars
  • Cost of performance - if the VM are under-performing, they can be the cause for slow systems, lost revenue, customer satisfaction etc.
  • Cost of growth - how much reserve is needed to keep on-hand for un-forecast growth; or the time and effort to procure more assets. VM sprawl puts a lot of pressure on the costs of growth and cost of waste.

 

Another key observation that we can observe are the relative TCO results by VM generation. When we work with a customer to see how they have deployed systems, we know what to expect in terms of overall costs. DIY VM systems are usually the highest overall rate. 1st generation CI systems (Flex-pod, VCE) are next, with 2nd generation CI (UCP) with the next best/lower rate. Advanced orchestration and provisioning tools (on top of the converged platforms) tend to provide the best (lowest) overall TCO.

 

generation1.png

In my next entry, I will talk about the process that we use to create an arms-length VM TCO baseline for a customer environment. With a good baseline defined, IT architects and operations staff are then able to set tactical and strategic plans to reduce the unit costs of Virtual Machines. Every IT shop is different, in their cost sensitivities, VM sizes and quantities and historical deployments. We do baselines not to compare to others, but to help with an individual cost improvement (continuous improvement) program.

Outcomes