If you would like to read the other parts in this article series please go to:
- Hyper-V Optimization Tips (Part 1)
- Hyper-V Optimization Tips (Part 2)
- Hyper-V optimization tips (Part 3): Storage queue depth
- Hyper-V Optimization Tips (Part 5)
In earlier articles in this series we have looked into several aspects of Hyper-V optimization including how disk caching settings should be configured on both Hyper-V hosts and on the virtual machines running on these hosts; how Hyper-V performance can depend upon the underlying storage subsystem of clustered Hyper-V hosts and how to optimize and troubleshoot performance on those systems through judicious choice of storage hardware; and how storage queue depth can potentially impact on the performance of virtualized workloads running on Hyper-V hosts. When we looked at that last topic we mentioned that there is usually a level beyond which increasing queue depth leads to no further I/O performance benefits and in fact can begin to be counterproductive, particularly for certain workloads like Microsoft SQL Server.
In this present article we're going to examine further how one can optimize Hyper-V performance when hosting clustered SQL Server workloads. This is an important topic because SQL Server differs in some ways from other kinds of server workloads, and this can make virtualizing SQL Server a non-trivial task compared to, say, virtualizing a domain controller, file server or web server role, particularly when the SQL Server workload is clustered.
Storage options for virtualizing SQL Server
Let's start off by considering the different storage options available when virtualizing SQL Server for high availability on Hyper-V. Basically, you have two ways to approach this problem:
- Using shared storage with Failover Clustering
- Using SQL Server Availability Groups
Let's briefly examine these two different approaches.
Using Failover Clustering
Failover Clustering is the standard Microsoft solution for providing high availability and scalability for most Windows Server workloads. Windows Server 2012 R2 introduced a number of significant improvements to Failover Clustering, and the procedure for installing a SQL Server failover cluster is well-documented on Microsoft TechNet and there is also an excellent walkthrough created by the Egyptian IT pro Mohamed El Manakhly that can be used to quickly build a lab environment for testing and development as outlined in this post on his blog The Admins Guide (TAG).
When installing SQL Server on failover cluster instances in Windows Server 2012 R2, you can also choose between two different sub-approaches, namely Shared VHDX or Virtual Fiber Channel. For example, you could install your failover cluster instances with shared virtual hard disks (Shared VHDX) on Windows Server Scale Out File Servers (SOFS) using Remote Direct Memory Access (RDMA) cards. Shared VHDX is a feature that was previously introduced in Windows Server 2012 and lets you use a virtual hard disk (VHDX) file as shared storage for a guest cluster, that is, a failover cluster made from virtual machines. Shard VHDX is designed to help protect application services like SQL Server that are running within virtual machines and it makes deploying failover clusters a lot simpler since you can use Server Message Block (SMB) shares as your shared storage instead of needing to use iSCSI or FiberChannel storage arrays. For a good explanation of how Shared VHDX works on Windows Server 2012 see the explanation by Matthew Walker on the Ask Premier Field Engineering (AskPFE) blog on TechNet.
The other way of course you can install SQL Server on failover cluster instances would be to use Virtual Fiber Channel and a SAN. Virtual Fibre Channel lets you directly connect applications like SQL Server running within virtual machines to Fibre Channel storage on your network. This allows you to leverage your existing investment in SAN technologies to support virtualizing workloads with Hyper-V. The Hyper-V Virtual Fiber Channel Design Guide which is available from TechNet is the best source of information for how you can get started using Virtual Fiber Channel. The blog Hyper-V Notes from the Field also has a helpful walkthrough on setting up Virtual Fiber Channel.
Using Availability Groups
The alternative approach besides Failover Clustering to providing high availability and scalability for SQL Server is to use Always On Availability Groups (or simply Availability Groups). This feature was first introduced in SQL Server 2012 and they are basically an enterprise-level alternative to mirroring your databases to ensure their availability at all times. MSDN is the best source of information if you want a good overview of how Availability Groups work, and this YouTube Video by Edwin Sarmiento from a Microsoft Canada TechDays event has a helpful walkthrough of how you can convert database mirrors into Availability Groups.
Possibly the best approach
My own mind on this topic is that I find the Shared VHDX approach to be the most attractive one in many respects when you want to provide high availability for SQL Server workloads running on Hyper-V. My reasons for this are as follows:
- When you use Shared VHDX as shared storage for your virtualized failover cluster instances, you will have zero storage administration work as far as the guest is concerned.
- Using Shared VHDX also means that you will not need to perform any per-virtual machine storage configuration on your Hyper-V host systems.
But the most attractive approach feature-wise is not always the best approach performance-wise. For example, while Shared VHDX provides a number of advantages for SQL Server failover clustering virtual instances, it can result in redirected I/O in certain scenarios which can affect performance. For example, one SQL Server consultant has informed me that he has seen Shared VHDX implemented on a cluster shared volume (CSV) result in some significant I/O-related performance problems when I/O is redirected, and he therefore advises caution when following this approach. The kinds of scenarios where such performance issues arise tend to be those where you use Shared VHDX with local block storage, for example with a Fibre Channel or iSCSI storage array, which results in I/O redirection.
There are workarounds that can be implemented however to mitigate these types of performance problems. For example, making sure that your intra-cluster communications has sufficient bandwidth is one key consideration for ensuring I/O redirection doesn't adversely impact the performance of your virtualized SQL Server cluster. This means for example using 10 GB NICs in your servers instead of those cheaper 1 GB NICS you have lying around your shop. Or better yet, get some RDMA-enabled NICS and everything should be fine as far as performance is concerned. The previously referenced blog post by Matthew Walker deals in detail with some of these I/O redirection issues, but note that Matthew's focus is on using Shared VHDX with iSCSI-based SANs and not Fibre Channel SANs.
One needs also to be aware that it's not just the speed of your NICs that can affect the performance of your failover cluster but also the rest of the network infrastructure you use to connect your cluster nodes to each other, to the shared storage, and to the clients using the clustered applications. A poorly designed cluster network can often prevent a failover cluster from performing at optimal levels. Basically if you're using Shared VHDX in this kind of setup, you need to ensure that your failover cluster interconnects are optimized by doing things like:
- Using RDMA-compliant 10 GB NICs.
- Using Receive-Side Scaling (RSS) which means your interconnects should not be part of a virtual switch.
- Configuring RSS queues to their maximum supported values.
- Configure the power plan on your Hyper-V host to High Performance.
- Disabling NetBIOS if it is not required.
You might also want to try tweaking the transmit and receive buffers on your NICs to see if this can improve performance. And you can try using Jumbo Frames though the verdict seems to be still out on whether that can help in this particular scenario.
Some other considerations
But it's not just the network that can be the primary bottleneck for your SQL Server cluster's performance. For you may be using Shared VHDX with local block storage on your SAN and have a solid network design, but then it turns out the SAN itself becomes the bottleneck. That's because SANs are basically legacy technology that have a complex stack whose performance is governed by their disk feed rate, LUN feed rate, storage controller port rate, and switch port rate. Then you have to deal also with the Hyper-V hosts and their HBA port rate, CPU feed rate, and cache read-ahead rate.
One other disadvantage with using Shared VHDX is that in its current implementation in Windows Server 2012 R2 it is not completely compatible with Hyper-V Replica, Hyper-V feature that was introduced Windows Server 2012. Hyper-V Replica provides a built-in replication mechanism for asynchronously replicating virtual machines from a primary site to a secondary site. Hyper-V Replica can thus be a valuable addition to your disaster recovery strategies but it won't be fully compatible with Shared VHDX until Windows Server 2016. Until then however, Microsoft does not support using Hyper-V Replica with any form of guest clustering. But this is a disaster recovery issue and not a performance issue, and we're focusing here on optimizing Hyper-V performance for SQL Server clusters.
Availability Groups on the other hand are a native SQL Server solution for high availability, but they also have their performance-related problems. For example, whenever you implement Availability Groups you can expect to use twice the storage and also generate twice the IOPS compared to not using this feature.
Wrapping it up...
Optimizing Hyper-V performance for a complex scenario like clustered SQL Server is a non-trivial but fascinating subject to explore. There aren't too many helpful resources available yet on this topic, possibly because both the Windows Server Hyper-V and SQL Server platforms continue to evolve towards their 2016 releases. As a result we may revisit this topic again in the near future.
Got questions about Hyper-V?
If you have any questions about Microsoft's Hyper-V virtualization platform, the best place to ask them is the Hyper-V forum on Microsoft TechNet. If you don't get help that you need from there, you can try sending your question to us at firstname.lastname@example.org so we can publish it in the Ask Our Readers section of our weekly newsletter WServerNews and we'll see whether any of the almost 100,000 IT pro subscribers of our newsletter may have any suggestions for you.
If you would like to read the other parts in this article series please go to: