Hyper-V optimization tips (Part 3): Storage queue depth

by [Published on 9 Aug. 2016 / Last Updated on 9 Aug. 2016]

This article explores the topic of storage queue depth and its potential impact on the performance of virtualized workloads running on Hyper-V hosts.
If you would like to read the other parts in this article series please go to:

 In the previous articles in this series we began by examining disk caching settings and how they should be configured on both Hyper-V hosts and on the virtual machines running on these hosts. We then went on to examine the dependence of Hyper-V performance on the underlying storage subsystem of clustered Hyper-V hosts and how to optimize and troubleshoot performance on those systems through judicious choice of storage hardware. We also began discussing how you might tune the storage subsystem on clustered hosts, but this topic is naturally quite dependent on your choice of vendor for your storage hardware. In this present article we continue the previous discussion by examining the issue of storage queue depth and its potential impact on the performance of Hyper-V hosts and the virtual machines running on these hosts, which of course translates into the performance of the workloads and applications running in those virtual machines. 

Reviewing the scenario

We will continue basing our discussion on the scenario we laid out in the previous article of this series, namely, a four-node Windows Server 2012 R2 Hyper-V host cluster with CSV storage that is hosting a dozen virtual machines running as front-end web servers for a large-scale web application. Our virtual machines are also using the Virtual Fibre Channel feature of Windows Server 2012 R2 Hyper-V which lets you connect to Fibre Channel SAN storage from within a virtual machine via Fibre Channel host bus adapters (HBAs) on the host cluster nodes. In the previous article we discussed using disk answer time, that is, the average response times of the CSV volumes of the host cluster, as a way of measuring the performance of the web applications being hosted on our Hyper-V host cluster. We then suggested that if application performance was measured to be below a certain threshold (we decided on 20 msec as a reasonable cutoff level) then the administrator of the host cluster needed to take some kind of remedial action to deal with the storage system bottleneck and try to boost application performance. Five examples of possible recommendations that we identified were:
• Follow your storage vendor's best practices
• Use faster disks in your storage array
• Use RAID 10 instead of RAID 5
• Make sure the firmware of your storage controller has been updated
• Enable write caching on your storage controller (but see Part 1 in this series for some considerations in this regard)

Referring to the first recommendation above however, one of the issues frequently discussed in documentation for storage arrays is adjusting the queue depth to try and optimize performance. Let's begin examining this issue in more detail.

Understanding queue depth

Storage arrays such as Fibre Channel SANs (storage area networks) are often used nowadays as the storage system for Hyper-V host clusters. In a typical example the storage array is connected to one or more SAN switches to form the SAN fabric. The SAN switches expose SAN ports (also called switch ports or fabric ports) where hosts (i.e. servers) can connect to the SAN fabric through host bus adapters (HBAs) installed on the servers. For a detailed explanation of how SAN technologies work, see Brien M. Posey's series of articles titled A Crash Course in Storage Area Networking here on WindowsNetworking.com.
One of the tunable settings of a host bus adapter (also more generally called a storage controller) is its queue depth. This setting specifies the number of I/O requests that the storage controller is able to queue up for processing. Queue depth on HBAs is defined on a per LUN (logical unit number) basis. If the queue on the storage controller becomes full, the controller will reject and discard and further storage requests (reads or writes). It is then up to the host (i.e. the application running on the host) to try issuing the I/O request again a short time later.
When you are configuring queue depth on the host (i.e. on the HBAs in your Hyper-V hosts) you should try to follow these general guidelines:

Configure the queue depth setting the same way on all hosts in your cluster to ensure hosts have equal access to the storage pool on your SAN.
• The HBA queue depth should be configured to be greater than or equal to the number of spindles that the host is connecting to (for a HDD-based storage array).
• In general, the larger the host cluster (and the more virtual machines running on it) the higher you should set the queue depth.
• However there is usually a level beyond which increasing queue depth leads to no further I/O performance benefits and in fact can begin to be counterproductive, particularly for certain workloads like SQL Server.
• Don't allow the queue depth on the hosts (i.e. on the HBAs) to exceed the queue depth configured on the SAN ports to which the HBAs are connected.
These are only basic recommendations however; in the real world where you may have multiple virtual machines running on each host and multiple HBA cards in each host, things can get quite a bit more complicated.

 Determining optimal queue depth

In general, choosing the optimal queue depth for HBAs on clustered Hyper-V hosts in a SAN environment is best decided by consulting the documentation from your SAN vendor since your HBA cards are usually provided by the same vendor. As an example (though taken from the VMware world instead of Microsoft) the following formula is recommended by one expert for determining queue depth in a scenario where you have multiple ESXi hosts using SAN storage:
Port-QD = ESXi Host 1 (P * QD * L) + ESXi Host 2 (P * QD * L) ..... + ESXi Host n (P * QD * L)
Here Port-QD represents the target port queue depth, P represents the number of host paths connected to the array target port, L represents the number of LUNs presented to the host via the array target port, and QD equals the LUN queue depth on the host. You can read the full article here.

Does Microsoft have a similar formula for calculating queue depth in Hyper-V host cluster environments? Unfortunately no--they basically just leave this to the storage (SAN) vendor to provide for you. As an example of this, Hitachi has a PDF document titled "Best Practices for Microsoft SQL Server on Hitachi Universal Storage Platform VM" which says the following concerning HBA queue depth settings:

"Setting queue depth too low can artificially restrict an application’s performance, while setting it too high might cause a slight reduction in I/O. Setting queue depth correctly allows the controllers on the Hitachi storage system to optimize multiple I/Os to the physical disk. This can provide significant I/O improvement and reduce response time."
They then provide the following formula for calculating queue depth:
2048 ÷ total number of LUs presented through the front-end port = HBA queue depth per host
A colleague who actually tested this out however for a clustered Hyper-V host environment running SQL Server found that while the formula recommended using a queue depth of 128, the tested performance was actually better when using a queue depth of 64. So in other words, one should view formulas like these as guidelines for beginning your tuning of queue length and not as hard-and-fast rules.

Digging deeper

Let's dig a little deeper into my earlier statement above that there exists a level beyond which increasing HBA queue depth leads to no further I/O performance benefits and can actually be detrimental to the performance of the applications running on your host cluster. The fact is, if the storage devices in your array are slow then increasing queue depth really only makes your storage I/O pipe longer, not faster. In other words, if you're application is responding slowly or timing out due to a storage bottleneck, the problem is very likely not the queue length setting but slow storage devices (or a suboptimal RAID configuration). In addition, increasing the queue depth on the host-side (the cluster) without taking into consideration the configuration of the storage-side (the SAN) might simply end up overwhelming the storage array to the point that performance begins to degrade. 

So if you're monitoring the queue length on your HBAs and you see the queues filling up, you might be tempted to increase the queue depth on the SAN switches to accommodate more I/O requests from the HBAs on the hosts. But while doing this might result in less buildup in your HBA queues, you might actually see application performance decrease because the I/O may now be hanging in the storage controllers.

The bottom line

The bottom line here is really that whatever you do in terms of tuning queue depth, either on the HBA side or the SAN side, you need to test the effect each time you make a change. This also means each time you add an additional host, or another HBA to a host, or faster disks in the array, you need to go back and start all over again and test the performance of your environment. So the best approach besides following the general guidelines listed above and trying to get your head around your storage vendor's recommendations is probably to start off with the default queue depth settings and make changes one at a time to see whether your application performance improves or is degraded. And really, you only need to worry about tuning queue depth if you're seeing alarming disk response times of more than a couple of seconds. Otherwise, your application performance is probably good enough to provide "good enough" satisfaction for most customers and clients, so why waste your time and energy trying to squeeze out another second from the response time? It's better to deal with more pressing issues such as ensuring the security of your customer's credentials and financial records, dealing with national privacy compliance issues for different customer segments, and so on.


We'll examine some other tips for improving storage performance in Hyper-V environments in future articles in this series. 

Got questions about Hyper-V?
If you have any questions about Microsoft's Hyper-V virtualization platform, the best place to ask them is the Hyper-V forum on Microsoft TechNet. If you don't get help that you need from there, you can try sending your question to us at wsn@mtit.com so we can publish it in the Ask Our Readers section of our weekly newsletter WServerNews and we'll see whether any of the almost 100,000 IT pro subscribers of our newsletter may have any suggestions for you.

See Also

The Author — Mitch Tulloch

Mitch Tulloch avatar

Mitch Tulloch is a well-known expert on Windows Server administration and cloud computing technologies. He has published over a thousand articles on information technology topics and has written, contributed to or been series editor for over 50 books.


Featured Links