Storage Acceleration: Solving the I/O Performance Gap Problem

Here is an IDC Corp‘s report, Storage Acceleration: Solving the I/O Performance Gap Problem, by Jeff Janukowicz, research director, Solid State Storage, and Matt Eastwood group VP and GM, Enterprise Platform Group.

Advancements in server/CPU performance and increased use of virtual machines and compute-intensive business applications are exposing a critical performance gap between enterprise servers and storage systems. Data requirements and CPU processing power have grown rapidly over recent years, but the performance of storage has not kept pace. Storage I/O is the bottleneck. A new class of server-based storage acceleration is needed to address this bottleneck and bridge this important performance gap.

The following questions were posed by QLogic Corp. to Janukowicz and Eastwood on behalf of QLogic’s customers.

What applications, datacenter trends, and infrastructure initiatives are driving the I/O performance gap problem?
The growing number of diverse applications and the digitization of information are creating increased demands on the infrastructure. An explosion of devices and users is making organizations struggle to maximize performance at the system level. At the same time, organizations are trying to lower their operational costs and maximize investment protection.
Most organizations have a complex legacy installed base, and they’re faced with a shift in how they deploy applications and consume resources. So they’re virtualizing and consolidating, which is putting more and more pressure on the datacenter and legacy infrastructure. They’re also looking at new classes of applications that can create a competitive advantage by capitalizing on some of the macrotrends in the market, such as mobile, big data, and analytics. They’re trying to make a more impactful shift in their business in terms of how they deploy technology, including clustered server architectures.
The compute power of CPUs has been growing in line with Moore’s law over the past decade, but the storage subsystem’s performance has not kept up with the performance needs on the compute side of the equation. As a result, we’re seeing a performance gap between what the processor can compute and what the storage I/O subsystem can deliver. This performance gap is further exacerbated by the rapid growth in data volumes most organizations are experiencing today. Virtual machine density is growing, as are I/O demands and workloads. In fact, IDC projects that data volumes in the digital universe will grow by a factor of 44 over the next 10 years.

What are the business requirements driving applications to process more transactions, lower latency, and increase response times?
Businesses are being pressured to move faster than ever, and IT has historically struggled to keep up with this shift. So you can look at this question from the perspective of both the datacenter and the systems that live in the datacenter. If you look at the datacenter itself, a lot of the focus is on rationalization and consolidation of facilities and on virtualizing resources in the datacenter. Organizations are trying to implement best management practices in their datacenters and embrace new types of automation tools, which affect the layout of the datacenter.
But, really, it boils down to the workload. A number of workloads in the industry continue to move into these virtualized environments, which require shared and managed storage. The environments, in turn, are becoming much more dynamic as workloads move from one host to another. This puts a lot of pressure on the storage ecosystem and the network. So there’s a need to start thinking more holistically about your IT resources and the various silos in your datacenter spanning compute, storage, and networking. There’s a huge focus on trying to get better utilization out of resources as well as staff. On top of this, the world is heading toward a future where new types of applications, such as Hadoop and big data analytics, are being developed that exploit a virtually unlimited set of hardware resources available via the cloud. This explosion in new workloads must also integrate with existing on-premises IT applications.
Initially, the focus tends to be on performance, but there are real economic benefits to being more balanced in your thinking around traditional compute, storage, and networking infrastructure silos. If you can start to move these silos more in parallel, then that’s a huge competitive advantage. For example, in financial services and other markets where technology is thought of as a strategic differentiator – even as a critical element of the business itself – time to decision is crucial. Firms will always look at anything they can do to move faster than the other guy because that’s an advantage to their core business. More and more customers consider time to value as part of their IT decision making, and time is typically measured in terms of speed and flexibility.

How are companies addressing the I/O performance gap problem, and what are the advantages/limitations of each method?
Companies are looking at a number of different approaches to solve the I/O performance gap problem. The difference in each approach depends on what issues companies are trying to solve and where they want to implement those solutions in their infrastructure. For example, one approach is tiering or caching with solid state technology in the storage array, which has some advantages in terms of increasing performance. We’re also seeing the emergence of all-flash arrays, as well as some caching appliances that help mitigate the performance gap. Yet, these solutions do not necessarily bring the most active data any closer to the CPU or application.
More recently, there’s been a movement toward a server-based caching solution. The idea with this approach is to take advantage of the intrinsic benefits of solid state technology and move the data onto higher-performance media of solid state closer to the processor. By leveraging this method, you narrow the performance gap considerably by improving performance and reducing latency.
This approach typically adds some complexity and has limitations – deciding what needs to be cached versus what doesn’t need to be cached, managing multiple device drivers in a virtualized environment (i.e., drivers in every guest OS to manage), and the lack of coordinated caching across multiple servers in an application cluster. However, the technology is evolving and the hardware solutions continue to mature. The reality is that flash is the ideal technology to use in terms of the memory/storage hierarchy to solve the I/O gap in a way that is nondisruptive to both the application and the underlying infrastructure. It fits very nicely from a performance perspective and a cost perspective.

For architectures such as a clustered server configuration, what are the ways to move beyond the limitations of current solutions while keeping current implementations in operation?
Clusters or grids are a fascinating area where typically the objective is to execute compute functions as fast as possible. In many ways, they’re a lot like what happens in the high-performance computing or technical computing space. It’s all about throughput. So anything that can be done to help the existing systems do their work faster is welcomed.
There are hybrid-type architectures to increase acceleration, such as offload engines that can move data through the path faster. They free up resources on core processors to focus on the compute functions and not become the bottleneck for other functions such as the I/O path.
Organizations are constantly looking for ways to create more elasticity in their clusters so that they can grow the cluster fairly uniformly with the output. In other words, as the workload grows, the organization can add resources and see the benefit on the back side. Elasticity and modularity are areas of focus.
Organizations are also focused on investment protection. They are always looking for ways to maximize what they get out of the environment over its life cycle – and they’re willing to invest in that infrastructure to add capabilities and technologies that will actually help these systems run faster and extend the useful life of their investment.

How do you plan and implement this type of latency-improving architecture, and what are the benefits?
In the past, to improve I/O performance and improve latency, datacenter managers looked to using more DRAM memory, short-stroking their hard drives, or striping across multiple HDDs. However, with solid state, flash-based SSDs become highly available and, more important, become cost effective due to advancements in solid state technology. Organizations are now looking at using solid state storage as a latency-improving architecture.
A server-based caching approach with a shared cache pool across multiple servers provides a level of high availability in the form of a data cache. This approach also provides flexibility to the hardware platform, which is key. Organizations want to deploy enhancements to their infrastructure that don’t require any touching of the application or the data itself. They’re looking for things that can be added to the core server, or added to the core storage array, but that don’t require any rewrite of the application. Clearly, anything that can be done in that area will be seen as a huge benefit.
Ultimately it comes down to four key benefits: agility, elasticity, flexibility, and simplicity. It’s about taking an organization’s existing investments and allowing a very seamless upgrade path, which results in accelerating the overall system performance. These benefits extend to applications that historically have been viewed as scalable workloads, such as databases or business process management, versus something that might be happening on a Web site or in the analytic space. But both of those scenarios can benefit from server-side caching.
Moreover, when you consider the trends of consolidation, virtualization, and automation, each area has the challenge of trying to get the compute, the storage, and the network to behave better together. Virtual machines, for example, are growing at an incredibly rapid rate, and that growth is putting new kinds of pressure on the datacenter. Organizations are always looking for new ways to accelerate performance while making that migration path simpler to absorb.