Tiering in RAID Storage Environments
By Kimberly Robinson, performance engineer, LSI
This is a Press Release edited by StorageNewsletter.com on May 31, 2012 at 2:55 pmThis article has been written by Kimberly Robinson. She works as a performance engineer for LSI Corporation‘s Storage Division. She has been working on optimising enterprise storage solutions for major OEMs for over ten years.
With the advent of SSD technology, shrinking costs of high speed
volatile memory, the low cost of SATA, and the reliability of SAS,
optimal organisation and integration of new storage technologies has
become more difficult. Opportunities exist to place frequently accessed
‘hot’ data on lower latency, faster media, while leaving rarely accessed
data on higher latency, lower cost media.
With all these choices available to IT Storage professionals, there is
an enormous opportunity to innovatively use cost, performance and
capacity metrics to determine the ideal location for user data. This
article will discuss how storage tiering can significantly improve
performance and reliability in mixed storage environments, complementing
and enhancing the host operating system cache, while optimising costs.
Today’s servers provide many different functions, and while we can make
generalisations, each application produces its own unique workload
characteristic. In addition, performance needs depend on the current
loading and any QOS requirements. While storage environments have more
options than ever before, allowing for better customisation, at the same
time they add more complexity and make storage performance capacity
planning more difficult. Performance capacity planning requires
knowledge of application I/O traits, capacity and performance growth
requirements, disk and storage performance characteristics, data
protection needs, and company budget.
More Options = More Control
Modern storage controllers have a plethora of options: new and compound
RAID types, premium features, advanced cache options, and variants of
hardware offloading to suit every budget. Today’s advanced embedded
processors have made intelligent storage controllers even more capable,
allowing them to extend their capabilities and transform as new
technologies emerge.
Disk technologies are no exception to the boom of new options. SAS was
designed to integrate both SATA and SAS so both interfaces can be
combined to create a custom storage backend based on individual cost and
performance needs. The popularity of SATA lies primarily in its
excellent cost per capacity metric, however from a performance
standpoint, it provides the lowest overall performance. SAS provides
significantly higher performance and improved reliability, but at a
higher cost. Another option is SSD technologies which come in both SATA
and SAS interfaces. SSDs provide astoundingly higher random performance
than rotating media, but at a higher price point.
Cost per GB per Disk Technology
(Cost taken for value capacity)
Adding to the complexity are the performance results for different RAID
types. Optimising for your workload requires understanding your specific
I/O characteristics and knowing how to map that into the ideal RAID
type based on your availability needs.
If we focus on RAID-10, you can see clearly that some disk types are
better suited for different applications, with SSDs costing on average
about 6.5x more than 15K RPM 6Gb SAS drives. Yet not all applications
provide an 6.5x increase in actual real world performance.
RAID 10 Workloads
Relative Performance Comparison
(Relative to 15K rpm SS)
Every IT professional probably has at one point or another asked the question, "how many rotating disks does it take to deliver the same performance as a high performing SSD?"
It’s a great thought process, however in the real world, it is likely
that only a portion of the actual storage capacity will be accessed at
any given time. Cache architectures have been successfully designed
based on this assumption for decades. What if you could build your
storage out of different storage mediums with different cost and
performance characteristics?
It’s Getting Too Hard
Storage vendors recognise that with the prevalence of non-uniform media
architectures, that tiering provides the best of all worlds. Storage
tiering is a simple concept; place the most frequently used data on the
fastest available media, while leaving cold data on slower media.
Tiering is different from caching in that the capacity of all
participating logical disks can be used for user data storage. While
this is not a new concept, it traditionally is not a part of the storage
intelligence, but SSDs disruptive technology has brought about new
opportunities.
Let’s look at an example of how storage tiering could help in a database
environment. company X is designing a new SQL Server, and based on
their past experience they know the following information:
- 4TB of storage
- 3% is hot (125GB) and accessed 65% of the time.
- 6% is accessed intermittently (250GB) 25%
- The remainder is cold data accessed 10% of the time.
- The database is accessed in 8KB sizes, with a read to write ratio of about 2:1.
- Eight slots are available for disks.
An ideal cost sensitive solution that gets us to 4TB would be to create a
logical device that provides the required performance solution for each
tier of data, both in terms of I/Os per second, and response times.
Let’s consider the homogenous disk alternatives.
Below is a summary comparison of the three storage infrastructure
options. Clearly, the tiered option not only provides a lower cost per
database transaction, but produces over six times more IOPs capability
than a SATA only solution, and more than three times the pure SAS
solution, with more capacity than the other proposals.
Many solutions can be built based on performance, cost, capacity, or
real-estate limitations. This is only one example that clearly
highlights the cost benefits of a tiered solution. Of course this could
be done manually, presuming:
- You already know exactly which files will be highly utilised
- You have the ability to separate them physically onto different media, and
- The hot data is not transient or dynamic.
Pros and Cons of Storage Tiering
Storage tiering offers the best of all worlds. By leveraging multiple
types of media, costs and performance can be optimised, and previous
server real estate can be saved. Consider this: To achieve the same
number of database IOPs possible in the tiering example would require
over fifty SATA drives, significantly increasing the power and rack
space requirements. Intelligent tiering can allow for a dynamic
environment where frequently accessed data is continuously and
automatically placed on the fastest media. Possibilities may even exist
where the most critical data can be placed on high availability volumes,
or data accessed by geographically distant sites can be copied to local
storage facilities.
Despite the benefits of tiering, there are a few drawbacks to consider.
Although the job of identifying and properly storing the hot data is
done automatically, building a proper storage subsystem which meets your
current and growing requirements still should be custom designed by a
storage professional. Another potential disadvantage is that under a
storage tiering model, although the logical volume would appear as a
single disk, the volume may be broken across multiple physical disk
groups. By using hardware RAID protection, the possibility of data loss
can be reduced.
We are in a ‘perfect storm’ of technology: dramatically increasing
storage capacity needs, more disk options than ever before, higher
performance demands from the rising popularity of digital business
transactions, increasing processing density, and the need for greater
protection of our most valuable asset, data. Tiering allows you to take
advantage of the low cost SATA storage costs, the security and
reliability of enterprise SAS and the high performance of SSDs all in
one bundle.