What are you looking for ?
Infinidat
Articles_top

IO/s Vs. Throughput, by Pure Storage

Why they’re important metrics for measuring storage performance, and what they mean for FlashArray performance?

Written by: Pure Storage, Inc. team

Both IO/s and throughput are very important metrics for measuring storage performance for HDDs or SSDs, and SAN. However, they each measure different things and there are various factors to take into account for both of them. Here, we are going to closely examine IO/s vs. throughput. 

Pure Storage Ios Vs. Throughput Intro

If you want to skip right to how Pure customers are using FlashArray in terms of reads and writes based on IO input data from across the entire FlashArray fleet deployed across 1,000s of customers, click here.

If you feel you could use a refresher on IOs and throughput and how they’re calculated, read on: What Are IO/s and Should You Care?

What is IO/s and how do you calculate it?
IO/s, pronounced ‘eye ops,’ is the measurement of the number of I/O operations a storage device can complete within a single second. It’s a standard performance benchmark for SSDs, HDDs, flash drives, and NAS devices. 

An ‘input’ is any information sent into a computing system via an external device, such as a keyboard or a mouse. An ‘output’ is the response to or result of processing the data that came from the input. 

An input is considered a ‘read’ operation because the computer is reading the data and putting it into its memory, and an output is considered a ‘write’ operation because the computer is transferring data by writing it to somewhere else. 

Calculating IO/s can be tricky because so many factors go into determining throughput and performance. The type of drive is also a factor, so an SSD will have a different calculation compared to an HDD of the same size. The type of RAID used in an array will also be a factor because some RAIDs will carry a performance penalty. For example, RAID-6 has a much higher penalty than RAID-0 because it must distribute parity across all disks.

Generally, basic calculation for IO/s is: (Total Reads + Write Throughputs) / Time (in seconds)

What is throughput and how do you calculate it ?
Throughput is a measure of the number of units of information a system can process in a given amount of time. Throughput is most commonly measured in bit/s, but may also be measured in bytes/s or data packets per second. 

IO/s vs. throughput: Which should you use to calculate performance?
Throughput and IO/s are interrelated but there is a subtle difference between them. Throughput is a measurement of bits or bytes per second that can be processed by a storage device. IO/s refers to the number of read/write operations per second. Both IO/s and throughput can be used together to describe performance. 

In fact, there are 3 factors that must be combined to tell the full story of storage performance: bandwidth rate, latency, and IO/s. Most storage vendors tend to focus on IO/s to brag about how fast their storage system is. But measuring storage system performance by IO/s only has value if the workloads using that storage system are IO/s demanding.

IT professionals often use IO/s to evaluate the performance of storage systems such as AFAs. However, looking at IO/s is only half the equation. Equally important is to look at throughput (units of data per second) – how data is actually delivered to the arrays in support of real-world application performance. So in short, you should use both.

When speed matters, the amount of data that can be written and read from a storage device is an important factor in performance. For individual devices, the difference in IO/s might be negligible for performance, but in a data center and enterprise environment, the amount of bytes that can be read and written to storage will affect performance of your customer applications.

Using throughput only isn’t the whole picture, though. Latency also plays a part in storage performance, and it refers to the time it takes to send a request and receive a response. The device mechanics, controllers, memory, and CPUs can also have an effect on latency. All these factors go into the performance of a system. Related reading: Thick-provisioned IOs in the Public Cloud

IO/s HDD vs. SSD: Look at FlashArray devices
SSDs and HDDs are built very differently. An HDD stores data on spinning platters, while an SSD is circuits with no moving parts. Because physical limitations restrict the speed at which HDD platters can spin, HDDs do not have the high-speed throughput of an SSD. Read and write actions on an HDD depend on the spinning speed of platters, so they’re often used for storage where reads and writes are not common. 

For example, HDDs are affordable for backup storage. For active applications, SSDs are commonly used since they offer faster IO/s. Circuits can transfer data faster than a physical spinning device, so an SSD will always be the preferred storage solution for enterprise applications with numerous reads and writes.

IO reads vs. writes: How Pure customers are using FlashArray
Let’s start with the simplest question first: Are customers using the FlashArray device more for reads or writes? Is it 50/50? There are some who fall into each of these categories. Some FlashArray devices are 100% reads and some are 100% writes though such extremes are rare. The vast majority of FlashArray devices run a mix which favors reads over writes.

The median distribution is ⅔ reads and ⅓ writes, with 78% of customer arrays running workloads that have more reads than writes. Figure 1 below shows the RW distribution across all customer of Pure Storage FlashArray devices.

Figure 1. The distribution of RW across all customer FlashArrays.  The y-axis shows the percentage of each.
The x-axis represents all the customer arrays sorted
from those that do the most reads to those that do the most writes.

Pure Storage Ios Vs. Throughput F1

Why most customers have more reads than writes: 78%
A read is when a user requests for data to be retrieved from a storage device, and a write is the action of writing ones and zeros to disk for future retrieval. Writes happen when the server sends data to the storage device, but reads are much more common on enterprise servers where applications retrieve data and cache it in memory where it can be retrieved. Backup devices and database storage might have more reads than a standard application server, so make sure you know the server’s purpose before determining IO/s importance.

Why customers have more writes than reads: 21%
Writes are common when users store data more than retrieving it. A database server might have many writes per second as users store data to it. Backup servers are mainly used for writes where users store their data but rarely need to retrieve it. Backup server storage devices are optimized for writes to speed up performance, which is especially necessary when terabytes of data must be backed up every day.

Why some customers have all writes: 1%
In some mirrored environments, you might have the mirrored storage for failover only. In these instances, you would have all writes with no reads if the primary disk never fails. You may eventually have reads if the primary disk fails, so you should still ensure that read performance is sufficient for temporary usage while the primary disk is repaired.

Why IO read vs. write distribution matters
Why does it matter whether customers are using reads vs. writes?

In most critical enterprise applications, reads and writes are discovered when analyzing the software. Analysts might find that an application writes a large percentage of the time versus reads on data from storage. The storage system must be able to prioritize writes over reads. This doesn’t mean read performance can be completely ignored. Enterprise systems need storage devices that have fast reads and writes, but the device installed on a critical server should be optimized for either reads or writes to keep performance at its best.

IO Size vs. throughput: Analysis of FlashArray devices
We learned that Pure customers tend to do more reads than writes. What’s the most common size at which they do these reads and writes?

Let’s start with a very direct approach by looking at the distribution of IO/s for every customer FlashArray based on the different size buckets. Then let’s average these distributions for all the customers. This way we learn what on average are the most common IO sizes across all customers. This is shown in Figure 2 below.

Figure 2. Average of all customer array’s IO size distribution
Click to enlarge

Pure Storage Ios Vs. Throughput F2

We see that the 2 most popular sized buckets for reads are 8KB to 16KB and 4KB to 8KB respectively. For writes, the most popular size is 4KB to 8KB.

By looking at the distribution of IO sizes on an individual array basis, we see that the majority of arrays are dominated by IO/s which are small in size:

 

Majority of IO/s < 32KB

Majority of IO/s < 16KB

Reads

73% of arrays

56% of arrays

Writes

93% of arrays

88% of arrays

However, looking at IO/s is only half the equation. Equally important is to look at throughput (bytes per second) – how data is actually delivered to the arrays in support of real-world application performance.

We can look at IO size as a weight attached to the IO. An IO of size 64KB will have a weight 8x higher than an IO of size 8KB since it will move 8x as many bytes. We can then take a weighted average of the IO size of each customer FlashArray device. This is shown in Figure 3 below.

Figure 3. Weighted-average IO size of all customer arrays
Click to enlarge

Pure Storage Ios Vs. Throughput F3

Looking at the weighted average tells a very different story than looking at raw IO/s.  We can see that the most popular read size is between 32KB and 64KB and the most popular write size is between 16KB and 32KB. In fact, the majority of our customers’ arrays have weighted IO sizes above 32KB rather than below 32KB.

 

Majority of IO/s => 32KB

Majority of IO/s => 16KB

Reads

71% of arrays

93% of arrays

Writes

30% of arrays

79% of arrays

So how do we reconcile these 2 different views of the world? IO/s tells us that most IO/s are small. But weighted IO/s (throughput) disagrees.

To better understand what is really going on, let’s plot the IO/s and the throughput on the same graph. Figures 4 and 5 show the distribution of IO/s by IO size and the distribution of throughput by IO size across all our customers’ FlashArray devices globally. Figure 4 is writes and Figure 5 is reads.

Figure 4. Distribution of write IO sizes and write throughput sizes across all customer arrays
Click to enlarge

Pure Storage Ios Vs. Throughput F4

Figure 5. Distribution of read IO sizes and read throughput sizes across all customer arrays
Click to enlarge

Pure Storage Ios Vs. Throughput F5

What we observe is that for both reads and writes the IO/s are dominated by small IO/s, while the majority of the actual payload is read and written with large IO/s.

On average, 79% of all writes are less than 16KB in size, but 74% of all data is written with writes that are greater than 64KB in size. This hopefully sheds some light on how IO size distributions are different when talking about IO/s vs. throughput.

IO size modalities: 4 most common IO distributions
So far, we’ve been looking at IO sizes averaged across all FlashArray devices. Now, let’s get away from this aggregation and see if we can spot any patterns in the kinds of workloads that customers are running.

It turns out there are four typical IO size distributions or profiles that are most common, as shown in Figure 6 below.

They are:

  • Unimodal where 1 bucket dominates all others
  • Bimodal with IO/s falling into 2 buckets 
  • Trimodal with IO/s falling into 3 buckets
  • Multimodal where IO/s spread out fairly nicely across most buckets
Figure 6. Examples of 4 most common IO size distributions that we observe in customer FlashArrays
Click to enlarge

Pure Storage Ios Vs. Throughput F6

Unimodal is the most prevalent distribution followed by multimodal then bimodal and trimodal. Keep in mind that even a unimodal distribution can represent more than one distinct application running on the array that happens to mostly use the same block size.

Bottom line on IO/s vs. throughput: Why both IO size and throughput matter
We presented a lot of different data above. We saw that typically, reads dominate writes. We saw that typical IO sizes depend on how you look at them. From a strictly IO/s perspective, smaller sizes dominate, but from a throughput perspective, larger sizes do. We also saw that more than half of customer arrays are running a workload with more than one dominant IO size. So how do we pull all this together and make some practical sense of it?

First, we need to look at FlashArray in light of the fact that the world is moving toward consolidation. It’s an uncommon case that a customer purchases an array to run just a single application on it. This isn’t just conjecture. As part of our Pure1 analytics, we apply a predictive algorithm to try to identify which workload is running on each volume of a FlashArray. We’ve extensively validated the accuracy of this model by cross-checking the predictions with customers. We’ve found that 69% of all customer arrays run at least 2 distinct applications (think VDI and SQL, for example). Just over 25% of customer FlashArray devices run at least 3 distinct workloads (think VDI, VSI, and SQL, for example). The mixture of different applications on a single FlashArray, as well as variability within the applications themselves, create the complex IO size picture that we see above.

It’s nice to see this validated in practice, but this was the core assumption we made more than seven years ago when we started building FlashArray’s Purity operating environment. Unlike most flash vendors who focused on single-app acceleration, we focused on building affordable flash focused on application virtualization and consolidation from day 1, and thus, one of the main design considerations of Purity was a variable block size and metadata architecture. Unlike competitors who break every IO into fixed chunks for analysis (for example, XtremIO at 8K) or who require their de-dupe to be tuned to a fixed block size or turned off to save performance per volume (for example Nimble), Pure’s data services are designed to work seamlessly without tuning, for all block sizes, assuming that every array is constantly doing mixed IO (and it turns out that even single-app arrays do tons of mixed-size IO too!). If your vendor is asking you to input, tune, or even think about your application’s IO size, take that as a warning – in the cloud model of IT, you don’t get to control or even ask about what workloads on top of your storage service look like.

So, the company’s FlashArray devices are deployed in consolidation environments leading to a complex IO size distribution, but what does that mean from a practical benchmarking perspective? The best way to simulate such complexity accurately is to try running the real workloads that you intend to run in production when evaluating a new storage array. Don’t pick 1 block size, or 2, or 3, etc. – any of these narrow approaches miss the mark for real-world applications with multiple IO size modalities, especially in ever-more consolidated environments. Instead, actually try out your real-life workloads – this is the only way to truly capture what’s important in your environment. Testing with real-world IO size mixes has been something that we’ve been passionate about for years. See prior posts on this topic at the end of this article.

All that said, if that kind of testing is not possible or practical, then try to understand the IO size distributions of your workload and test those IO size modalities. And if a single IO size is a requirement, say for quick rule-of-thumb comparison purposes, then we believe 32KB is a pretty reasonable number to use, as it is a logical convergence of the weighted IO size distribution of all customer arrays. As an added bonus, it gets everyone out of the ‘4K benchmark’ thinking that the storage industry has historically propagated for marketing purposes.

Hopefully, this blog sheds some light on how to think about performance in a data-reducing all-flash array, as well as how we’re using big data analytics in Pure1 to not only understand customer environments but also to better design our products. If you’d like to learn more about Pure1, please check out the product page.

Resources:
P
revious blog posts on the topic of benchmarking:
What Is SQL Server’s IO Block Size?    
Modeling IO Size Mixes with vdbench     
Why IO/s Don’t Matter

Articles_bottom
AIC
ATTO
OPEN-E