Back to basics (Part 2): Virtualization 101: A Storage Primer

by [Published on 2 Feb. 2012 / Last Updated on 2 Feb. 2012]

In this article we’re going to focus on storage.

If you would like to read the other parts in this article series please go to:

Introduction

Virtualization is a science with some art thrown in from time to time. As such, significant time and effort is (or should be) expended on properly planning the architecture of the virtual infrastructure. In this article, we’re going to focus on storage, but let’s start with a discussion about the three critical resources that go into the virtual melting pot:
  • Compute. This is the processing part of the equation and is handled by whatever processors are put into the servers hosting the virtual environment. Today’s processors, in their dual, quad, 6-way, 8-way and higher core count processors make easy work when it comes to choosing what to put into the hosting servers. When more compute is needed, it’s not that difficult to simply add another server to a cluster or replace a lesser-powered server with a newer powerhouse.
  • RAM. I’ve seen many virtual environments brought to their knees due to too little RAM being installed in the hosts. That said, adding RAM to existing hosts or, again, replacing hosts with beefier units isn’t a massive undertaking in most cases.
  • Storage. Even though it’s possible to add too little processing power or too little RAM, I’ve seen too many environments that miss one of two key storage metrics:
    o   Capacity. This one is generally easy. How much capacity do you want to have in your virtual environment? We’ll explore this in more depth in this article.
    o   Performance. This one is a bit harder and is the one that is often overlooked. Many inexperienced administrators focus on the capacity side of the equation and do not always consider performance. After all, throw a few 2 TB SATA drives into the server and there will be plenty of capacity, but those very few SATA disks won’t stand up to much in the way of read/write punishment, particularly once RAID is introduced into the equation.

On the storage front, there is another decision you need to make as well: How do you want the host servers to connects to the storage? Do you plan to use disks that are connected directory to the server (direct-attached storage) or are you going to make use of a network-based storage device/infrastructure – a storage area network (SAN) or network-attached storage (NAS).

RAID basics

Regardless of the kind of storage you buy for your virtual environment, you need to take steps to protect the data that is present on that storage. Although there are a number of ways that you can protect data, the most commonly used method is to employ some sort of RAID (Redundant Array of Independent Disks). With most RAID levels, you’re adding redundancy to your storage infrastructure at the expense of capacity and performance. For example, when you implemented a mirrored RAID level (RAID 1/RAID 10), you “give up” 50% of your raw capacity and you take a hit when it comes to how quickly the system can write data to the disk. Why? Each command to write data to the array results in the need for two write operations to take place – one on each side of the mirror set. We’ll be talking about performance soon.

Each RAID level has its own set of pros and cons. Here’s a quick look at each common RAID level and its impact:

Raid level

Protection

Tolerance

Min

disks

Overhead

Write

impact

RAID 0

None

0 disks

2

None

None

RAID 1

Very good

1 disk

2

50%

2x

RAID 5

Good

1 disk

3

1/n disks

4x

RAID 6

Excellent

2 disks

4

2/n disks

6x

RAID 10

Excellent

1/2 of disks*

4

50%

2x

RAID 50

Good

1 per set

6

1/n disks * RAID 5 sets

4x

RAID 60

Excellent

2 per set

8

1/n disks * RAID 6 sets

6x

I need to make a couple of clarifying points regarding the table above. First of all, although RAID 0 has “RAID” in its name, there is nothing redundant about it. In RAID 0, you’re basically striping data across all of the disks in the array in order to increase overall performance. If you lose even one disk in that array, you will lose all of the data across the array. Although RAID 0 is leveraged in some other kinds of RAID (i.e. RAID 50), as a standalone solution, it’s very poor.

The tolerance column in the table displays the number of disks that can fail before you experience data loss. With a RAID 5 set, because RAID 5 uses a parity disk, a single disk in the array can fail. While that disk is failed the array is said to be in a “degraded” mode. If you have a hot spare disk configured, the array will

You’ll note that you can lose “1/2 of disks” in a RAID 10 set. In theory, you could lose all of the disks on one side of the mirrored set and still be operational on the other mirror. That said, the chances that you’d just happen to lose exactly the right disks are pretty low, so don’t think that RAID 10 lets you really lose a ton of disks.

The capacity question

When it comes to calculating storage needs, capacity is the easy side of the equation, even when you throw RAID into the mix. It’s all a matter of relatively simple multiplication. For the purposes of the examples here, we’ll assume that all disks in the sample arrays are 1 TB in size. This makes the numbers more clear.

Now, let’s assume that you have an array with a total of 15 disks in it. That’s a raw capacity of 15 TB. Vendors use the term “raw capacity” to describe an array that has just left the factory and has not yet been formatted or RAIDed.

  • RAID 0. If you are using RAID 0, you’ll get close to the raw capacity of the storage array, but bear in mind that you won’t have a bit of data protection. So, you’ll have 15 TB of capacity.
  • RAID 1. With a 15-disk unit, you have an odd number of disks, so, in this configuration, you’d assign one disk as a hot spare that can jump in and take the place of a failed disk. That would leave you with 14 disks and one-half the remaining capacity (two mirrors of seven disks each) for around 7 TB of usable space.
  • RAID 5. With a RAID 5 set, again, you generally want to have a hot spare. So, again, you have 14 disks to work with. Here, you could create one large disk set or you could create smaller, separate RAID sets. If you created one large RAID 5 set, you would have 13 disks worth of capacity, so 13 TB of usable space. You could alternatively create two 7-disk RAID 5 sets with each one having its own parity disk. Each RAID 5 set would have a usable capacity of 6 TB with the entire array then having 12 TB of usable capacity.
  • RAID 6. Again, let’s just use 14 disks here. Since RAID 6 needs two disks of overhead for parity. A single disk set leaves 12 TB of usable space and two seven disk sets each require two disks for parity overhead. That means that you end up with total usable space of 10 TB.
  • RAID 50. Two RAID 5 sets, 7 disks each, 1 parity in each = 12 TB .
  • RAID 60. Two RAID 6 sets, 7 disks each, 2 parity in each = 10 TB.

At first glance, it might seem that RAID 6 is your best choice as it offers the most redundancy and protection without that much in the way of overhead. However, as I mentioned before, capacity is but one variablein the equation. Performance is just as important as capacity and increases in importance depending on the kind of environment you’re deploying.

For example, if you’re deploying a server environment, performance may not be quite as noticeable to the user, but if you’re deploying virtual desktops, performance problems will be immediately noticeable to users.

In the next part in this series, we’ll talk about storage performance and the different ways that you can connect storage to your hypervisor-wielding host servers.

If you would like to read the other parts in this article series please go to:

The Author — Scott D. Lowe

Scott D. Lowe avatar

Scott has written thousands of articles and blog posts and has authored or coauthored three books, including Microsoft Press’ Exchange Server 2007 Administrators Companion and O’Reilly’s Home Networking: The Missing Manual. In 2012, Scott was also awarded VMware's prestigious vExpert designation for his contributions to the virtualization community.

Latest Contributions

Featured Links