2Q17 and Lifetime HDD Failure Rates at Backblaze

Annualized rates of 1,97% on 83,296 3.5-inch drives from 2013 to 2017
This is a Press Release edited by on 2017.09.13

AddThis Social Bookmark Button

This report was written on August 29, 2017 by Andy Klein from Backblaze, Inc.



HDD Stats for Q2 2017

In this update, we'll review the 2Q17 and lifetime HDD failure rates for all our current drive models. We also look at how our drive migration strategy is changing the drives we use and we'll check in on our enterprise drives to see how they are doing. Along the way we'll share our observations and insights and as always we welcome your comments and critiques.

Since our last report for 1Q17, we have added 635 additional HDDs to bring us to the 83,151 drives we'll focus on. In 1Q we added over 10,000 new drives to the mix, so adding just 635 in 2Q seems 'odd.' In fact, we added 4,921 new drives and retired 4,286 old drives as we migrated from lower density drives to higher density drives. We cover more about migrations later on, but first let's look at the 2Q quarterly stats.

HDD Stats for 2Q17
We'll begin our review by looking at the statistics for the period of April 1, 2017 through June 30, 2017 (2Q17). This table includes 17 different 3.5" drive models that were operational during the indicated period, ranging in size from 3 to 8TB.

Quarterly HDD failures rates for 2Q17
Observation period: January 1, 2017 - June 30, 2017

When looking at the quarterly numbers, remember to look for those drives with at least 50,000 drive days for the quarter. That works out to about 550 drives running the entire quarter. That's a good sample size. If the sample size is below that, the failure rates can be skewed based on a small change in the number of drive failures.

As noted previously, we use the quarterly numbers to look for trends. So this time we've included a trend indicator in the table. The "Q2Q Trend" column is short for quarter-to-quarter trend, i.e. last quarter to this quarter. We can add, change, or delete trend columns depending on community interest. Let us know what you think in the comments.

Good Migrations
In 2Q we continued with our data migration program. For us, a drive migration means we intentionally remove a good drive from service and replace it with another drive. Drives that are removed via migrations are not counted as failed. Once they are removed they stop accumulating drive days and other stats in our system.

There are three primary drivers for our migration program.

1 Increase Storage Density - For example, in 3Q we replaced 3TB drives with 8TB drives, more than doubling the amount of storage in a given Storage Pod for the same footprint. The cost of electricity was nominally more with the 8TB drives, but the increase in density more than offset the additional cost. More about the cost of cloud storage here
2 Backblaze Vaults - Our Vault architecture has proven to be more cost effective over the past two years than using stand-alone Storage Pods. A major goal of the migration program is to have the entire Backblaze cloud deployed on the efficient and resilient Backblaze Vault architecture.
3 Balancing the Load - With our Phoenix data center online and accepting data we have migrated some systems to the Phoenix DC. Don't worry, we didn't put your data on a truck and drive it to Phoenix. We simply built new systems there and transferred the data from our Northern California DC. In the process, we are gaining valuable insights as we move towards being able to replicate data between the two data centers.

During 2Q we migrated the data on 155 systems, giving nearly 30PB of data a new, more durable, place to call home. There are still 644 individual Storage Pods (Storage Pod Classics, as we call them) left to migrate to the Backblaze Vault architecture.

A Backblaze Vault is a logical collection of 20 beefy Storage Pods (not Classics). Using our own Reed-Solomon erasure coding library, data is spread out across the 20 Pods into 17 data shards and 3 parity shards. The data and parity shards of each arriving data blob can be stored on different Storage Pods in a given Backblaze Vault.

Lifetime HDD Failure Rates for Current Drives
The table below shows the failure rates for the HDD models we had in service as of June 30, 2017. This is over the period beginning in April 2013 and ending June 30, 2017. If you are interested in the HDD failure rates for all the HDDs we've used over the years, refer to our 2016 HDD review.

Cumulative HDD failures rates
Observation period: April 2013 - June 2017

Enterprise vs. Consumer Drives
We added 3,595 enterprise 8TB drives in 2Q bringing our total to 6,054 drives. You may be tempted to compare the failure rates of the 8TB enterprise drive (model: ST8000NM005) to the consumer 8TB drive (model: ST8000DM002), and conclude the enterprise drives fail at a higher rate. Let's not jump to that conclusion yet, as the average operational age of the enterprise drives is only 2.11 months.

There are some insights we can gain from the current data. The enterprise drives have 363,282 drives days and an annualized failure rate of 1.61%. If we look back at our data, we find that as of 3Q16, the 8TB consumer drives had 422,263 drive days with an annualized failure rate of 1.60%. That means that when both drive models had a similar number of drive days, they had nearly the same annualized failure rate. There are no conclusions to be made here, but the observation is worth considering as we gather data for our comparison.

Next quarter, we should have enough data to compare the 8TB drives, but by then the 8TB drives could be 'antiques.' In the next week or so, we'll be installing 12TB HDDs in a Backblaze Vault. Each 60-drive Storage Pod in the Vault would have 720TB of storage available and a 20-pod Backblaze Vault would have 14.4PB of raw storage.