2010-2020 Storage Decade
24 vendors commenting on technology impact last 10 yearsBy Philippe Nicolas | January 13, 2021 at 1:56 pm
The last decade was an amazing period for the storage industry with several major technology developments and confirmations but also some key companies creation, M&As and IPOs.
We all remember the boom of the cloud, flash becoming mainstream, S3 as a de-facto standard, the launch of rapid growth companies such Cohesity, Pure Storage, Qumulo, Rubrik or Vast Data, some famous M&As like Cleversafe acquired by IBM, EMC by Dell, Red Hat by IBM, Mellanox by Nvidia or Veeam by Insight Partners, and also a few spectacular IPOs such Pure Storage, Nimble Storage and even Dell coming back to the public market a few years after its departure.
We collect diverse comments from 24 vendors: Arcserve, CTERA Networks, DataCore, FujiFilm, HYCU, Infinidat, Komprise, Lightbits Labs, MinIO, Nasuni, Panasas, Pavilion Data Systems, Pure Storage, Quantum, Qumulo, Seagate, SIOS Technology, SoftIron, Spectra Logic, StorCentric, StorONE, StorPool, VAST Data and VMware.
Sam Roguine, backup, DR and ransomware prevention evangelist
Attacks affecting critical infrastructure made ransomware lethal
The past decade has seen cyberattacks emerge as a major threat to businesses and their data. For companies in critical sectors like oil and gas, among others, it has become clear that the impact of a cyberattack could be significantly more far-reaching than just data loss. Cybercriminals have targeted systems most essential to a company’s bottom line and solicited a payout in return for a decryption key. To maximize their chances, these criminals have targeted critical systems and focused on industries essential to keeping society up and running. Companies have therefore had to expand their data protection and security protocols.
In critical industries, one of the main concerns is keeping downtime to a minimum to prevent widespread impact. As a first step, companies should define recovery point and time objectives for each system and application in their network. Categorizing applications by risk and determining what would have the biggest negative impact if they are not recovered quickly enables IT teams to chart a hierarchy.
While losing data wasn’t always deemed as serious as an attack that disrupts critical operations like power and gas, the importance of safeguarding data has become clear. It could contain information vital to the recovery process or be relevant for regulatory compliance – another major disruption of recent years. Many industries reverted to storing backups separately from the main network to help ensure they remained clean.
By integrating cybersecurity with data protection, companies were able to make the recovery process smoother and streamline IT efforts. A two-pronged approach also reduced the time between detection of an attack or breach and kickstarting backup and recovery protocols, making attacks less damaging.
Critical infrastructure workers became critical
Over the past decade, cyberattacks haven’t just been an IT issue. They’ve also had a significant business impact and members of every department have become involved in the planning process. Clear communication is particularly important, as attacks in these sectors could potentially be life-threatening.
In 2020, MITRE shared an update to its ATT&CK knowledge framework that specifically addresses the tactics cybercriminals use when attacking ICS – and tips for how to defend against them. Many IT and security professionals working in critical infrastructure now refer to this framework when building attack response plans and securing key systems. Businesses have also recognized the importance of regularly updating and testing plans so employees know how to respond to a real crisis.
As digital transformation continues to accelerate and attack surfaces expand, the upcoming decade, just like the previous, is sure to bring a variety of new cyber threats. Cyberattacks affecting critical infrastructure will only increase as cybercriminals realize just how damaging they can be, meaning IT pros working in these critical industries will need to be prepared.
Aron Brand, CTO
I’m amazed when I recall how in 2008, when we founded CTERA, we struggled to find enterprise companies that used any cloud at all. Since then we have experienced amazing growth of the enterprise adoption of cloud; in fact, enterprise cloud usage spending grew by a factor of 40 in a single decade.
In the early 2010s, enterprises spent most of their efforts on lifting and shifting their legacy workloads to the cloud. By the end of the decade cloud has become much more mature, and enterprises have entered their second wave of cloud adoption: the “Cloud Native era.” This era is characterized by shifting to workloads and apps designed from the ground up for the cloud and enterprises embracing the devops culture.
In recent years most security departments – except perhaps those dealing with homeland security – have lost virtually all their initial cloud aversion. CISOs have actually started endorsing cloud transformation, understanding that modern public clouds offer superior security and better compliance than their legacy datacenters and branch offices.
In the 2020s, the pendulum may be swinging back given the SolarWinds hack, and CISOs likely will be much more careful about selection of cloud vendors. This scrutiny will manifest itself in the form of enterprises requiring control of their own data encryption keys and demanding evidence from their software providers for stringent supply chain security using recognized certifications such as Open Trusted Technology Provider Standard (O-TTPS).
Nonetheless, the phasing out of the traditional datacenter will continue to accelerate in the new decade. Gartner predicts that by 2025 workloads will be nearly evenly dispersed across edge, on-premises, and public cloud, underscoring what we view as the need for edge to cloud technologies that bridge disparate infrastructure.
Already we are seeing significant momentum in the enterprise with global file systems that enable IT to address the changing requirements for remote data storage and access in the post-Covid world. Enterprises will continue to invest in these and other hybrid cloud technologies that might have seemed unimaginable only 10 short years ago.
Gerardo A. Dada, CMO
Over the past decade, tape backups, large monolithic storage arrays, HDDs and more gave way to new technologies including storage virtualization, flash, the cloud, NVMe and more. These developments have set the table stakes for storage entering 2021.
The cloud has not been the panacea early vendors touted, but adoption will continue to rise.
- A large part of the technology discussion in the last decade revolved around the cloud. It was originally publicized as a less expensive, more convenient storage option, but when IT started using more cloud resources, the associated costs showed that it was not always the most cost-effective option. As a result, many enterprises moved a portion of infrastructure data back on-prem for more control and better economics.
- Over the years, the IT industry has become smarter about what belongs in the cloud. This decision often involves either a move to the cloud or to keep data on-premises as it has been very difficult to build truly hybrid systems.
- Modern data management tools and software-defined storage (SDS) that spans multiple public clouds, private clouds, and on-premises infrastructure will help the industry reach a level of maturity in 2021 and beyond, made possible by smarter software that understands the profile of data, can access data anywhere, and automate its movement based on business priorities.
Hardware companies increasingly became software companies as the industry moved away from solutions that create hardware dependencies.
- In recent years organizations have been taking a deeper look at how their IT infrastructure solutions are deployed and operated. For example, if they are limited by data services, restricted by support for diverse storage equipment, and more.
- Demand for the flexibility to adapt and modernize the data center without being locked into a particular hardware vendor or technology will continue to increase.
- True SDS solutions will continue to grow and evolve to support the demand for a unified platform that can deliver the scalability, agility, and reliability required by increasingly hybrid storage infrastructures.
SDS has increasingly become essential to achieve flexibility and cost efficiencies within IT infrastructure.
- Over the last decade, the storage industry was slow to adopt virtualization as compared to networking, compute, etc. Instead, a large portion of the industry existed (and does still to this day) in a hardware-centric mindset, locked into specific vendors, technologies, and architectures.
- SDS has increasingly helped make storage smarter, more effective, and easier to manage through a uniform control plane capable of tapping the full potential of diverse equipment.
- In 2021, enterprises will modernize and automate legacy storage systems by proactively using advanced storage technologies to drive infrastructure-led innovation. SDS will become necessary in implementing this as it will make the best use of existing resources while future-proofing the IT environment for years to come.
In 2021, it is time for IT to realize the promise of SDS: an unified platform to simplify and optimize primary, secondary, and archive storage tiers, managed under a unified predictive analytics dashboard.
Rich Gadomski, head of tape evangelism
The past 10 years have been marked by explosive data growth, and the tape industry has experienced a renaissance thanks to significant advancements in capacity, reliability, performance, and functionality that have led to new applications and key industry adoption.
- In terms of capacity, the decade started for LTO with LTO-5 at 1.5TB native capacity and culminated most recently with LTO-8 at 12TB and LTO-9 soon to be delivered at 18TB.
- Enterprise tape formats started the decade at 1TB native and are currently at 20 TB native.
- Barium Ferrite magnetic particles became a key enabler for multi-terabyte tapes and were demonstrated by IBM and FujiFilm in 2015 to have the potential to achieve 220TB on a single tape cartridge. This signaled that tape technology had no fundamental areal density limitations for the foreseeable future.
- By the end of the decade, IBM and FujiFilm demonstrated the ability to achieve a record areal density of 317Gb per square inch using the next generation of magnetic particles, Strontium Ferrite, with potential cartridge capacity of 580TB.
Reliability and Performance
- During the decade, tape achieved the highest reliability rating as measured by bit error rate at 1×1019 , even better than enterprise HDD at 1×1016.
- Data transfer rates for tape also improved from 140MB/s in 2010 to an impressive 400 MB/s.
- LTFS provided an open tape file system with media partitions for faster “disk-like” access and ease of interchangeability, making LTO a de facto standard in the M&E industry.
New Applications and Key Industry Adoption
- Storing objects on tape became a reality with object archive software solutions offering S3 compatibility, objects can now move to and from tape libraries in their native object format.
- The concept of active archiving grew in popularity with tape as a key component complementing flash, HDD and cloud for cost-effectively maintaining online archives.
- Tape was recognized for its ease of removability and portability, providing air gap protection in the escalating war against cybercrime.
- Major U.S hyper scalers began to rely on tape during the decade for both back-up and deep archive applications. In one well-publicized example, Google restored a February 2011 Gmail outage from its tape backups. Microsoft adopted tape for Azure later in the decade. Tape became firmly established as a competitive advantage for these and other hyper scalers based on its scalability, long archival life, lowest TCO, low energy consumption, and air gap security.
- With this steady technological advancement over the last decade, tape has been recognized for its complementary value to flash, HDD and cloud in tiered storage strategies for managing data in the zettabyte age.
Subbiah Sundaram, VP products and marketing
One of the biggest disruptions in storage over the past decade has been the rapid pace at which public clouds are taking over the Secondary Storage market by delivering extremely price competitive storage for secondary storage workloads. Now, companies like Google are able to deliver archive storage that can match the cost of tape – considered a benchmark for the cheapest storage you can find for your data.
Storage solutions like those made popular by Data Domain were considered the benchmark for cost-effective storage of large amounts of data with their deduplication technology. About 5 years ago, scale out secondary storage started taking off as a great replacement for Data Domain. Now, even those solutions providers realize that they cannot stop customers from adopting public clouds and thus they are providing an option for customers to offload data to Public Clouds right from their backup targets.
Customers now have a choice when they want to use public clouds. They can front end their storage with a scale out secondary storage on-premises or use intelligent software that can dynamically choose what to keep locally and what to keep on the cloud to keep the cost low. i.e. why pay for a virtualization layer if your software can do this at no extra cost.
This secondary storage trend carries forward even as customers migrate their production loads to public clouds. They really need to think hard if they want to run a software version of their on-prem de-dupe appliances on the cloud or use the native storage. While the software de-dupe appliances might optimize the storage, they do end up using a significant amount of compute and memory which completely outweighs the cost savings.
The second trend that has emerged is true backup as a service. People have imagined this for more than a decade. For a long time, to satisfy the hype, vendors took their classic on-prem software infrastructure, virtualized it and provided images for customers to run on the cloud and called it “as a service”. While that was a good “marketecture”, it was not a true SaaS offering. Finally starting in late 2018, we have started seeing true SaaS solutions emerge in the secondary storage space.
If people are wondering, how to determine a true SaaS offering, here are 4 things they should check:
1. It should be something you can turn on and off in an instance
2. It should be something you don’t have to do sizing – because it should be able to scale from 1 VM to hundreds of thousands.
3. It should leverage the services of the cloud you choose.
4. It should be something you don’t have to upgrade, maintain or monitor.
5. It should not require training.
Ken Steinhardt, field CTO
Storage technologies, solutions and best practices have certainly evolved between 2010 and 2020 and the market has grown, but what may be most captivating is identifying the inflection points that have defined the past decade and set the stage for today and beyond. The focus of this article is primarily on the inflection points that impact the enterprise market, which requires large capacity, demanding performance and the highest availability possible.
In comparison, the lower end of the market (entry level) has also evolved but remains oriented toward cheap, easy and small-scale, which is why it has predominantly gone to the cloud, while the mid-range market settles for fewer features in carefully-managed trade-offs. But the enterprise market is different.
The inflection points that will be explored in this article paint a picture of the world changing from conventional storage to tiering to all-flash arrays to the newest advancement that capped off the 2010-2020 decade – intelligent enterprise storage.
Inflection Point of 2010 – Tiering
In 2008 the world’s first enterprise flash drive technology was introduced into storage, beginning a phase that would go on to redefine the storage industry. The technology was revolutionary in its ability to have latency so much faster, but flash was extremely expensive. A couple of years later, the emergence of automated tiering marked an inflection point to kick off 2010 in the storage industry.
Tiering recognized that organizations could not afford to buy all flash at the start of the decade but could use software to look at what data was “hot” or “cold” or in the middle within a pre-defined period of time (i.e. every day or every hour, or even about every 10 minutes). The software would look at the data usage, and then the system would automatically move the data to an appropriate storage tier based upon its usage.
Typically about 1-5% of data would be moved to flash, while 5-20% was moved to less expensive 15,000 or 10,000rp conventional disks, and as much as 80% or more was moved to the least expensive, largest capacity conventional hard disks. If the software was good and the applications were reasonably predictable and fairly consistent, then an organization using a mix of a small amount of flash, coupled with most of the capacity on the lowest-cost HDDs, could achieve higher performance at a lower cost compared to conventional all-HDD systems. A radically faster system was possible and justifiable; therefore, tiering became popular and allowed flash drives to get a foothold in the market.
However, tiering was not for everyone. For some application environments, it did not fit. For example, if there was a super-hot performance spike in data that lasted five minutes, then it would come and be gone before even the best systems could figure out what happened and take any action. Worse yet, it might take action based upon that activity, despite it no longer being relevant (or practical) thus creating a problem where one no longer existed. Automated tiering was not a catch-all for all applications or environments.
Inflection Point of 2012 – AFAs
The next major inflection point of the last decade was the shift to AFAs. With an all- flash implementation, every I/O run on it was usually faster. Solutions started to hit the market that were easier to use and featuring “self-managing” capabilities, while adding compression and de-dupe. There was a huge increase in effective storage capacity, sometimes as much as 5X or greater than the raw storage capacity.
With good data reduction, AFAs could narrow the cost gap between themselves and conventional storage systems. But just like the legacy storage architectures that AFAs sought to replace, their architectures were similarly based upon the concept of protecting against merely no single point of failure, using either dual-controllers or aggregations of dual controllers. Their creators failed to see the emerging need for more highly-available architectures to serve more mission-critical availability requirements.
Inflection Point of 2020 – Intelligent Enterprise Storage
The inflection point that has transpired as one decade ended and the new decade began was the shift to intelligent enterprise storage, which has three major, defining characteristics: (1) the extensive use of AI, ML, and, in particular, deep learning; (2) the advancement of a triple-redundant architecture, and (3) simple, non- disruptive “everything” to do with storage.
One of the most basic applications of AI/ML for storage is the use of predictive analytics, often delivered through cloud-based, software intelligence. It replaces human monitoring with AI-powered monitoring and analysis to predict future capacity and performance requirements, or at its most basic level, the health of the system itself.
Deep learning can be used to intelligently deliver faster-than-flash performance at a radically lower cost by continuously analyzing all I/Os in real-time, and promoting data from ultra- low-cost persistent storage media to ultra-fast DRAM cache. When done well, this caching of data occurs before that data has even been requested from a server. The DL software places multiple “bets” about what might be requested next, based upon previous activity, and then learns from its own actions to continuously self-optimize in real-time.
The change from existing storage systems that are based on dual redundant architectures to a new, triple redundant architecture has a benefit that cannot be ignored. It provides not only higher availability, but a much larger performance cushion in the case of a component or even full-controller failure. With conventional dual-controller based architectures, although technically they are still initially “alive” after any single failure, often the surviving controller is incapable of handling its own performance load plus the workload of the failed controller that it just inherited, resulting in a total failure.
And last, but not least, the drive to greater simplicity is following a path of thinking that says the users should be able to turn on capacity themselves without an on-site visit from the vendor. Automating the infrastructure eliminates the primary source of outages with storage systems – problems caused by human intervention. Automation will increasingly define the years ahead, ushering in an era of radical simplicity in enterprise storage.
This inflection point has also unearthed a keen revelation: wise companies are agnostic to any specific technology or media type. Media does not matter to intelligent software. This realization changes the game in enterprise storage.
Thanks to the advancements over the past 10 years and the development of intelligent enterprise storage, an enterprise company can now get faster performance at petabyte scale with 100% availability and at a lower cost. The next decade is sure to be just as exciting as the last decade.
Krishna Subramanian, COO and president
Data Management-as-a-Service gains prominence
Enterprise IT teams are already straining to do more with less, and the complexity of what they have to manage is exploding as they now have to work across multiple clouds and storage vendors. In 2021, simpler automated Data Management-as-a-Service (DMaaS) will gain prominence for its simplicity and flexibility. Enterprises don’t need to babysit their cloud storage costs and environments, DMaaS solutions will work across multi-vendor storage and cloud environments and elastically adjust their capacity to migrate large workloads or move data across storage classes and tiers and optimize costs, all through policy-based automation. AI) and ML techniques will be leveraged more in DMaaS solutions to grow their analytics-driven data management capabilities and become more adaptive.
Kam Eshghi, chief strategy officer
From 2010 to 2020 storage was a key technology to watch, as an abundance of data infiltrated consumers’ daily lives. During the decade, worldwide data consumption jumped from 2ZB to 59ZB as IoT, AI and ML saw tremendous growth in applications.
Other observations included:
1. Rapid data center growth – with the staggering growth of worldwide data consumption, data center capacity grew very quickly. According to Research and Markets, the growth in third-party data center capacity by each region (including EMEA, the Americas and AsiaPac) from 2010 to 2019 – with total WW data center space and power capacity doubling over the period.
2. The growth of NVMe-oF technologies – while the first protocol was developed in 2008, the ratification of NVMe/TCP in 2018 brought NVMe-oF technologies into the mainstream and delivered the promise that companies could use existing standard infrastructure for their storage needs, reducing latency, overall cost and providing extensive ROI.
3. From 3G to 5G – The shift in mobile technology and explosion of IoT devices drove the need for edge data centers and edge storage solutions.
4. Social media boon – While social media platforms took root in 2004, these platforms grew tremendously throughout the 2010s. With more than half the world’s population (3.5 billion) online, the rise of hyperscalers accelerated the race to deliver affordable, scalable, low-latency storage solutions, like NVMe-oF.
5. Cloud native applications arrive – the growth of social media data created demand for cloud-native applications such as Kafka and Cassandra on Kubernetes.
AB Periasamy, CEO
Over the next decade the storage market will look fundamentally different from today. While the changes are underway – the metamorphosis will complete by 2030. First, machine generated data is going to remake most parts of the industry due to its extraordinary scale and performance requirements. The implications will impact storage architectures where object storage replaces NAS as the primary approach. Additionally, solid state will replace HDD as the primary medium on which data is stored due to its increasing superiority on the price/performance curve. Finally, the evolving nature of data architectures will render storage appliance vendors obsolete. The appliance brands that survive will become software-defined storage companies.
Russ Kennedy, chief product officer
During the 1950s, the world was introduced to the concept of data storage as the HDD and magnetic tape became the media used by the blossoming computer industry. Since then, Moore’s Law has driven many advancements, giving us plenty of options with previously unthinkable capabilities, but we are nevertheless still fighting many of the same challenges around cost, protection/security, performance, capacity and recovery.
The biggest advancement, no question, is the maturation of the cloud. 2001-2010 saw the advent of the cloud, but over the last decade, 2011-2020 the technology became mature and truly changed how we live and work. Services like Uber were just not possible in 2010, but in the past decade, the cloud has evolved to enable incredible innovations for both businesses and consumers, driving industry growth.
Despite this growth, advances in storage have not been able to match those of compute or networking, and fundamental issues like reliability, performance and security, still persist. Previously, networks used to be notorious bottlenecks, but now they act as an enabler for all different storage platforms.
The last decade saw storage become faster and more capable. As the cloud matured, it opened up a world of new business opportunities for organizations. However, storage will remain behind the eight ball as issues like data protection, speed and security continue to be challenges.
Curtis Anderson, software architect
Flash has not killed the HDD – Predictions of new technologies devastating the market for existing ones have never come true in the timeframes predicted, and for good reason. For example, tape storage is declining as a percentage of total storage capacity shipped but more tape drives and cartridges are shipped today than in years past.
Flash storage has huge advantages over HDDs, except for price. Over the last decade, in spite of constant predictions that the price will come down and kill HDD’s, flash has not yet fallen to a price point where it can replace hard disk drives for large scale data storage. That won’t change until the flash vendors decide they can no longer charge a premium for it, and HDDs will remain viable until then. The headline of “new tech kills old tech” is just a repeating part of the hype cycle.
Scale-out has come to mainstream storage – In the 2000’s networked storage from the likes of NetApp, very fast single machines that served files to other machines, went mainstream. In the 2010’s single machines ran out of gas and the industry started toward clustered file servers such as C-Mode from NetApp and Isilon’ s products. Those architectures are running out of gas and the industry will need to move to the much more scalable architectures pioneered by HPC storage systems such as Panasas’ PanFS.
Pavilion Data Systems
VR Satish, CTO
The evolution and adoption of hardware and software trends over time can be expressed as opposing sine waves. At the beginning of the last decade we saw a rise in software defined solutions, which culminated in the use of hyperconverged solutions that leveraged commodity hardware. Inversely proportional to the growing adoption of software solutions, there was a corresponding decline in specialized hardware. During this period of software dominance, hardware went from highly specialized solutions at the beginning of the decade, such as custom ASICs, to being highly commoditized. Toward the end of the decade, as software vendors began to struggle to differentiate themselves, specialized hardware became prominent again and there was an inflection point where we once again saw the importance of specialized hardware. This can be seen with the advent of technologies such as SCM, hyperparallel systems, and DPUs.
With each inflection point over the last decade there has been a leading company that changed the paradigm for what was possible. Some of these companies remained niche players while others, like Pure Storage, broke through and found a broader market. The Pavilion HyperParallel Data Platform, which offers unprecedented customer choice and control across block, file, and object also offers the most performant, dense, scalable, and flexible storage platform available, is an example of highly differentiated hardware that is on the current hardware sine that is changing the paradigm for what is possible. This is why Pavilion is on a trajectory to cross the chasm due to the combination of its HyperOS software and HyperParallel Flash Array.
In addition, we saw a move to object storage from legacy protocols. This is a trend that will likely continue as organizations move to take advantage of object storage, whether on-prem or in the public cloud.
Shawn Rosemarin, VP of global systems engineering
Rise of the “data economy”
Over the last 10 years data has evolved from a business output and liability to a catalyst for innovation as startups like Uber, Amazon and Netflix attacked mature players across industries, leveraging their data as a powerful differentiator. Ultimately this change in the paradigm of data has accelerated the need for storage, the analysis, and the dynamics of the industry forever.
All-flash storage is mainstream
Spinning magnetic disk was the de facto standard for decades. As the internet expanded, business systems digitized and new applications emerged, bringing magnetic disk to its knees in reliability, performance and in some cases, raw economics. Pure Storage led the all- flash revolution by embracing the performance, simplicity, reliability and scale of NAND-based flash storage forcing the industry to adapt. This mainstream adoption has also been validated by storage M&A activity and traditional vendors retrofitting legacy systems.
Rise of high performance “unstructured” file and object storage
Traditionally, file storage equated to “file shares” and object storage/content addressable storage equated to “archive. In the last 10 years, file storage has emerged as a critical platform for both log and streaming analytics. Object storage, with a new powerful protocol (S3) developed and mainstreamed by AWS, has been embraced by developers worldwide for its simplicity and ability to handle unstructured and semi-structured content with ease.
Enterprises recognize the importance of hybrid and multi-cloud in the IT ecosystem
In the last decade, the importance of hybrid and multi-cloud has been cemented. Most enterprises have unlocked new hybrid use cases where applications can migrate seamlessly between private and public clouds. In addition, application developers needed to develop a single storage API to manage their data, allowing them to develop applications in their private or public cloud, and run them anywhere.
Widespread adoption of containers and cloud-native services
Organizations increased their appetite and desire to leverage data as a catalyst for growth and as a result, application development has accelerated. Simply put, more applications generated more interactions which in turn generated more valuable insights. To support this boom in agile development, frameworks replaced traditional waterfall approaches, leading to the rise of Microservices and DevOps. With this paradigm change came a new lightweight “container” abstraction layer, led by Docker and built off of the decade old concept of LXC in Linux.
“As-a-service” became a de facto standard for all infrastructure
As public cloud adoption grew, it delivered agility and flexible consumption that have been applauded by the marketplace. Enterprises are now looking for all of their IT infrastructure and services to be delivered as a utility and “as-a-service”, in many cases supplanting complex leasing structures.
Eric Bassier, senior director, products
The last decade has delivered tremendous storage advancements and innovations. Flash has become widely adopted and commercialized, and NVMe development now allows the industry to unlock the performance of flash storage. Infrastructures have evolved from direct-attached to network-attached and now hyperconverged. We saw the development and broad adoption of public cloud infrastructure-as-a-service and software-as-a-service bring new elasticity levels to compute and storage resources. And data continued to grow at exponential rates, with 90% of all the world’s data being generated in recent decades.
Unstructured data, in particular video, digital images, and other ‘rich’ data generated by machines and devices, is growing at rates of 30-60% per year, and will soon represent 80% of all the planet’s data.
It’s this unstructured data that’s driving a revolution in how businesses are thinking about their storage infrastructure. Not only is it growing exponentially, it lives everywhere. It’s generated ‘at the edge’ – in a lab, on a city street, on the manufacturing floor, in space – and then needs to be processed, analyzed, and consumed. This data in most industries is central to an organization’s business or mission.
If the last decade was about storage innovation and management, this coming decade is about data innovation and management. Organizations will use AI analytics, index data and add context to it, and layer on metadata to enrich all of these valuable files and objects. We’ll see tremendous advancements and adoption of AI.
Many organizations will start to treat data as the valuable asset it is. It will be curated, well organized, and tagged in a way that makes sense to business users. Much of this data will need to be kept indefinitely – stored, protected, and accessible to users and applications. Data must remain at our fingertips.
The biggest emerging challenges of the next decade are: how to store and process data, discern what’s valuable, and unlock the business value to drive the next discovery, innovation, or business breakthrough.
Ben Gitenstein, VP of product
Over the past decade, the major trends in data storage fall within scale, flash and the cloud.
In terms of scale, the amount of data that is generated today was largely unforeseen ten years ago. Data is growing more rapidly every year and any storage system built a decade ago was architected for a completely different scale both in terms of total capacity and number of files in a system. In the coming decade, solutions with the ability to simply scale to petabytes (even exabytes) and billions of files, will become the de facto standard for most organizations.
In 2010, only a few of us thought all-flash would be a dominant storage solution for Enterprises. However, as mobile devices grew in popularity, flash supply boomed. Today, its use cases have expanded to general enterprise workloads and mission-critical applications with the evolution of high performance NVME interfaces on SSD, enabling businesses to begin to take advantage of its performance and low-latency benefits. Soon we will see that everything that can move to NVMe, will.
As for the cloud, I believe that if it’s not bolted down, it will end up in the cloud in the next ten years. Over the past decade, companies have had to justify moving their data into the cloud but today, that justification is exactly the opposite. More and more companies require the cloud to get their work done because they do not have the compute capabilities or applications in the their data centers that are needed to do mission critical business work.
With all of this new technology coming to rise in the past decade, it’s also important to acknowledge the necessity of data visibility. Without embracing data visibility, users would have difficulty understanding what data they actually have in their possession as a result of these technologies. That’s why, along with the rise of these data storage capabilities, we’ve also seen the introduction of tools that provide organizations with the ability to access and utilize data. While scale, NVMe and cloud will continue to pave the way for better data storage and analytics in the next ten years, data visibility will be the key for overall company and customer success in making smart decisions, understanding their systems and having a grasp on the secureness of their data.
John Morris, SVP and CTO
Against the backdrop of massive growth in unstructured data, object storage has become the primary method for managing data storage. Led by AWS S3, innovations in this area created the evolution of the transport layer for unstructured data. Object storage provides massive scalibility and significant storage cost reduction. Metadata associated with objects is a key enabler for analytics and search.
A broader adoption of SSDs
HDDs still dominate nearline storage and amount to about 90% of the total available exabyte market. But many mission-critical workloads transitioned from HDDs to SSDs, as 3D NAND reduced cost and provided significant improvement in performance. The transition has been a major step in boosting the performance of mission-critical and tiered data storage systems.
The rise of the edge
The last decade saw the beginning of a shift from enterprise data centers to IoT) and the edge – the latter needed to keep up with the explosion of devices. The connection of more and more devices and sensors to storage and compute facilitates user- friendly approaches. As, in some use cases, data center of gravity migrated toward the edge of the network, storage became more distributed. The edge enabled lower-latency access, which boosted the ascendancy of content delivery networks (streaming and localized, high-performance access to data). Thanks to the edge, for the first time in history, it became possible to stream data from local devices without the usual delay. IoT and the edge continue to be key drivers of exabyte growth.
Managing deployments of hardware across multiple geographical locations and ecosystems As storage and compute migrated to central cloud locations, hardware could be managed using methods common to enterprise data systems. But as customers increasingly value flexibility, scalability, security, and latency, hardware now occupies hyperscale data centers, colos, plus market and central office locations for the same hybrid cloud deployment. This requires much more extensive capabilities to remotely manage hardware. The dispersal of data has three main reasons: being closer to users, ensuring geo-replication in case of disasters, and moving closer to power resources.
Composability allows for the disaggregation of devices outside the server in pools of cohesive resources. It is a trend that surfaces in two categories: hardware and software. 1) Hardware involves dense, low-cost enclosures optimized for mass-capacity storage interfaces, connected with the high-throughput transport layer, for example NVMe over fabric. The latter began to provide the transport layer (while disaggregated storage fabric are not new, it had never been based on open standard protocols and drivers before. 2) Virtualization software orchestrates, provisions, and manages hardware and applications through the data plane and control plane. Multitenancy – a massive enabler for public cloud providers – is achieved thanks to hypervisors and/or containerization.
Impacts of fast networks on storage architectures
The major enablers to datacenter composability and disaggregation that came on the scene in the last decade are ultra low-latency and high-bandwidth Ethernet fabrics that are quickly scaling beyond 400GE, yet maintaining less than 10μs average latency.
Separation of compute and storage in hyper cloud
This trend is related to the rise of object storage mentioned. Data began to be computed and only stored transiently in local storage with the persistent, bulk storage being remote. Companies were now able to optimize and upgrade the compute and storage architectures independently. This is the efficiency and scalability cloud created, with better price points as a result. It helped many startups and smaller companies grow, achieving scalability and granurality of service that they could not attain on their own. The trend became more aggressively as storage moved from hyperconverge to composable.
Deployment of AI + GPU compute for training
In 2012, GPUs became mainstream compute devices for training deep learning models. This game-changer both benefited from the proliferation of data and boosted data growth. It enabled the resurgence of AI as a viable technology, pushing organizations to store more and more data for training bigger and bigger models.
New localization and data privacy laws
The passage of laws like GDPR in the European Union requiring that data belonging to EU citizens remain with the boundaries of the EU- with other jurisdictions following suit – had consequences on data storage. More “local” storage of data required caused an expansion of cloud data centers in places where data has to be stored. The regulations also contributed to the rise of the edge because they concerned data trust and data privacy on prem – the terrain in which security, control, and provenance overlap.
With the growing need for hard drive capacity, increasing the number of disks per hard drive has been a significant trend of the decade. The transition to a sealed helium environment enabled this capability by reducing internal disturbance and lowering the power consumption an important factor for deployment at scale.
Margaret Hoagland, director, product marketing
Local Storage Replaces SAN Storage as Critical Workloads Move to Cloud
Enterprise IT infrastructures will continue to evolve into increasingly complex combinations of on-premises data centers, SaaS solutions, and private, public, hybrid, and multi-cloud environments. More mission-critical applications, ERPs and databases, such as SQL Server, SAP S4/HANA, and Oracle will be deployed in or migrated into cloud and hybrid cloud environments.
This trend will drive demand for cloud-friendly local storage needed to provide the high availability failover clustering and disaster recovery protection required for these mission critical workloads – where stringent availability SLAs and fast RTO and RPO are expected.
From a storage perspective, SAN storage will continue to be a mainstay of traditional on- premises data centers, particularly in large enterprises. However, in the cloud and hybrid cloud, and other environments where SANs and other shared storage is impossible or impractical, efficient local storage will become the norm for important applications.
To meet RTO and RPO requirements, mission critical workloads in the cloud will be deployed in “SANless” failover clustering environments. In these configurations, cluster nodes are each configured with their own local storage. Efficient block-level replication is used to synchronize local storage, making it appear to clustering software like a traditional shared storage cluster configuration.
Synchronizing local storage has the advantage of enabling failover clustering in cloud environments using well-known clustering software, such as Windows Server Failover Clustering. Unlike other cloud clustering options, it also allows IT to locate cluster nodes in different cloud regions and/or availability zones for protection against site-wide and regional disasters. In on-premises high availability clustering environments where application performance is a priority, companies will replace SANs with synchronized high performance SSD/PCIe local storage.
As data centers become more complex, there will be a growing need for storage options that provide the configuration flexibility needed for high availability clustering and disaster recovery protection.
Craig Chadwell, VP product
We’re on the brink of a very exciting decade in the evolution of data storage: never before has so much information and technology been so accessible. And, as we work ever closer together with peers, customers and even competitors to share ideas and experiences, we’re only going to get better at solving tomorrow’s data challenges.
With that in mind, the following is what I think we’ ll be observing more of in the years to come:
- The ongoing abstraction of storage physical media and controllers from the user, or application, interaction. Closely related to this will be the increasing application of service, governance, and/or security policies through storage virtualization and software-defined storage.
- The growth of automated inference (AI) based on the increased adoption of task- specific hardware that is specifically suited to training on large data sets
- The emergence, or ‘coming of age’ of credible open-source storage platforms (e.g. Ceph, Gluster, Lustre) – and their wide scale adoption within the enterprise
- More niche players entering and significantly disrupting the market following decades of major missteps by the big vendors (e.g . too slow to respond to virtualization) that allowed the market to fragment as much as it has.
- Following on closely from the previous point, this decade will be punctuated by many, many, mergers and acquisitions which ultimately makes it harder for customers to trust in a long term strategy. In a somewhat viscous cycle, this attitude will continue to drive the increased demands and expectations from software- defined, or open source technologies.
- I don’t doubt that this will be the decade in which SDD really comes to the fore to become a credible data center technology, pushing out high performance HDDs, and driving major shifts in enterprise storage purchasing behaviors.
Matt Starr, CTO
The Rise of Ransomware
From the early days of computer hacking, threat actors have always attempted to steal data from an organization or business – which is one of the primary purposes of computer viruses. Over the last 10 years, the goal has changed from stealing data to kidnapping the data. Today, cybercriminals use ransomware to hold an organization’s data hostage with the threat of destroying the data or selling it on the dark web if their monetary demands are not met in a timely fashion.
The rise of ransomware puts a premium on data and data protection. Over the last decade, as threat actors have become more sophisticated, they have realized that money can be made by electronically infiltrating an organization’s IT department and encrypting data that might include trade secrets, IP, customer lists, etc. while threatening to sell or destroy that critical data unless a ransom is paid – usually in bitcoin to prevent tracking by authorities. The larger the company the more valuable the haul in many cases. Because an organization’s data is most important to that entity, they will likely pay more to get it back than anyone on the black market.
IT workflows and systems have become faster and more resilient – enabling organizations to improve their IT infrastructures and better serve their internal and external customers. These complex networks, however, also leave organizations more vulnerable to the encryption of ransomware as, once the systems on the network are attacked, they often are encrypted at the speed of disk. Only those devices that store data out of the network stream, such as tape, are safe from ransomware’s impact.
The Old Mantra that ‘Tape is Dead’ is Dead
For the last decade, just as in decades before that, the industry has been pronouncing the death of tape storage. However, today, tape is back into a growth mode according to most analysts assessing the market. Tape, with recent R&D innovations and deployments now sits poised for another 10-plus years of success. Unlike the nearest competitor, the spinning archive disk, tape’s R&D hurdles are well understood, the physics are known and don’t need to be invented. Even most of the cloud storage providers know that tape has to be a part of the ecosystem from a cost and reliability standpoint. As cloud vendors develop cold tiers of storage at a fraction of a cent per month per gigabyte, each forcing the other to a lower and lower price, tape, with its low total cost of ownership, high reliability, scalable capacity and low power and cooling costs, is the logical solution. And tape’s traditional usage for archive in industries such as oil and gas, M&E, government, scientific research, and more, is seeing increased deployment. With the advent of ransomware, tape is one of the only storage platforms that provides an air gap from that ransomware, meaning that it sits outside the network stream so cannot be infected. This means that, as data on disk are being encrypted, the data backed up and archived to tape remains safe and ready to allow a recovery of an organization’s IT platform.
Cloud Storage Comes into its Own
Over the last decade, as cloud has gone from a discussion to a reality, more and more companies look to the cloud for their “IT solution”, whatever that solution may be. In some cases, it is moving from a Capex to an Opex model, and in other cases, it is the idea that cloud must be cheaper than on-premise storage due to scale. For customers who have jumped head long into cloud storage, most have discovered the bitter pill that charges and other hidden fees drive the cost of cloud storage way up beyond their expectations and higher than their original budgets. It not to say that storing data in the cloud is wrong or bad, it is that so many companies have jumped on the cloud bandwagon without doing the same due diligence required for any onsite architectural changes. Like getting caught up in a wave of excitement without much thought, some users ran to the cloud with their data in hand, only to now learn the true cost of that move. Cloud storage, for customers who have used it, still have it or have left it, now understand the nuances of a cloud contract and cloud pricing model (including the inevitable and undesirable ‘cloud lock-in scenario’) vs. a capital purchase. Those users will be more cautious and careful over the next decade as they enhance their infrastructures, perhaps with a hybrid cloud approach that offers the best of both worlds – the scalability and accessibility advantages of cloud with the control, affordability and flexibility of on-premise storage.
Mihir Shah, CEO
Organizations Will Continue to Rapidly Apply AI (Artificial Intelligence) to Business Processes to Where Data is Critical
Rapid adoption of AI technologies while replacing manual processes to maximize operational efficiencies lies in extracting insights from large data sets. Dynamic process implementation for data security, data access, and data management based on business data will become the norm. Encryption, data protection, remote access, role-based access profiles, data movement, etc. will all be implemented as dynamic responses to on-going business operations based on deep data insights.
Ransomware Breaches Will Continue to Highlight Cybersecurity Failures
Ransomware will continue to motivate bad actors, exacerbated by recent high-profile attacks. Consequently, organizations will have to analyze their basic security controls with unbreakable backup solutions to ensure the ability to recover and maintain uninterrupted operations.
The Emergence of Remote Workforce Trends has Created New Challenges
Covid-19 forced organizations to quicky shift to remote work environments, and this shift will result in long-term remote and hybrid workforces in the upcoming years. However, this trend presents new challenges with the need to quickly and securely access data via cloud applications. Cloud data management has become, and will continue to be a central IT concern in order to mitigate security breaches and data leakage. Organizations will need the ability to seamlessly move data to eliminate silos and improve employee productivity, while simultaneously protecting data.
Gal Naor, CEO
Over the past decade, there has been a revolution in storage hardware.
Drive performance and density improved by 1,000x to a few 100,000 IO/s and 16TB per drive. Newer software has not enabled better drive utilization and has not taken advantage of modern storage hardware capabilities. The overall cost of storage remains high because of this very low hardware utilization – using less than 20% of the performance and less than 65% of the net capacity.
Many single product companies have emerged over the past decade.
The costs associated with the sales and marketing of single product solutions are extraordinary. Nutanix, for example, spends 89% of their total revenue on sales and marketing alone! R&D costs of single-product solutions are also enormous – quickly reaching 35% of revenue. With expenses exceeding revenue, it is no surprise that the valuation of single product storage plummets.
Many companies moved their storage from on-prem to the cloud.
The big challenge has been the cost. Most companies transferred to a cloud-based solution to save money but quickly realized that the total cost is prohibitive, as is the price of returning to an on-prem solution. First wave cloud vendors myopically focus on cloud-only solutions, which limit enterprise appeal. The second and much larger wave of cloud users require a seamlessly hybrid solution.
Boyan Ivanov, CEO
The last 10 years taught many people that storage is not an industry that can be penetrated by leveraging multiple rounds of investor cash that is burned through to fuel revenue growth. Many such attempts ended in bankruptcies or acquisitions – Nimble Storage, Nexenta, Tintri, Maxta, Nexan, Violin, Portworx, Compuverde, Actifio.
The 5-10-year storage replacement cycles, depending on the use case, meant that groundbreaking technologies could not achieve the escape velocities observed in other industries like communications, media, healthcare, or finance. Storage vendors need to integrate with the infrastructure ecosystem and achieving this naturally takes time beyond the 3-5 year runway achieved with 3-4 rounds of venture capital.
Technologies that came with high expectations and didn’t gain as much traction as expected
4. ARM in the DC
5. Public cloud
In 2010, it was common to see claims that almost all business workloads will be in public clouds within 10 years. As of 2020, the public cloud segment currently accounts for about 45% of the annual infrastructure market vs. 55% for private environments (private clouds, traditional IT). The next 3-5 years will show if we’re reaching a tipping point or a natural balance.
Vendors/products that came and stayed Pure Storage, Nutanix, VMware VSAN, Cloudian Honorable mentions: StorPool, Weka, Vast, Excelero, Lightbits labs.
The open source storage ecosystem saw the birth and wide use of a couple of notable projects Ceph/inktank/RedHat and MinIO. OpenZFS lives on, despite the acquisition of Sun Microsystems by Oracle.
Technologies that came and stayed:
- NVMe, NVMeOF, NVMe/TCP
- Object storage and the S3 API
- Kubernetes – container orchestration
- New categories of storage media
- SMR HDDs
- Capacity flash drives – QLC in particular
- NAND NVMes ultra performance NVMe – Intel Optane, Samsung Z-NAND
- Persistent memory DIMMs
- Technologies in their infancy as of 2020
- Computational storage – offloading some of the compute load to run closer to the data, e.g. on storage nodes or on storage devices themselves.
- Composable infrastructure, composable storage – the capability to disaggregate storage and compute resources and to freely re-combine them dynamically.
Howard Marks, technologist extraordinary and plenipotentiary
When we look back on the 2010s, the biggest trend in storage won’t have been the move from file to object, or even the rise of so-called software-defined storage, but rather the decline of the hard disk. In 2010 HDDs were still the dominant data storage medium they had been for over 50 years, spreading from data centers to laptops.
At the beginning of the decade, by comparison, flash was a very expensive tier-0 solution used by high-frequency traders and national labs. Flash was only considered leading edge, while the vast majority of online capacity, including “high performance” applications, used spinning disks.
As the cost of flash-based storage decreased faster than the cost of HDDs, it went from aspirational storage that users wished they could afford, to the point of replacing hard drives over an increasing group of applications. For many applications, there is a tipping point beyond which hard drive use falls off dramatically. For laptops and other clients, this was when the cost of a reasonably sized SSD (say 500GB today) was the same as the minimum cost of a HDD (which has been stable at $40-50).
Since the average customer was only going to use 400GB or so, a 500GB SSD has a lot more value than a 2TB HDD.
In the data center, HDDs got squeezed between the shrinking price gap between flash and disk and the reduction in storage system performance that comes with cheaper storage via bigger disks. Bigger HDDs cost less on a per gigabyte basis but since one 7,200rpm disk can perform about 100 IO/s regardless of its capacity, a 1PB storage system using 20TB drives can only deliver 1/20 th the IO/s of the old system that used 1TB HDDs.
This limits hard drives today to cold, if not glacial, data with limited, primarily sequential access, and even these object stores and appliances use flash for their metadata. IO/s limitations become especially severe in applications that use data deduplication and similar reduction techniques that store data in small blocks. Reading, or rehydrating, data from these small blocks requires a lot of back-end IOPS even when the data is logically sequential. AFAs, with millions of aggregate IO/s across all their SSDs, can use these reduction techniques to close, or eliminate the cost gap between flash and hybrid/disk systems.
All this has led to a 50% reduction in unit sales of HDDs over the decade, and the beginning of a massive transition to affordable, all-flash solutions.
What does this mean for the next decade? It means the elimination of decades of tradeoffs, longstanding infrastructure complexities, and application bottlenecks, progress we’ve already begun to see. It means simplifying how organizations store and access large reserves of data in real-time. Most importantly, it means a decade of new insights and discoveries, from the next big AI development to a major life sciences breakthrough, that were not possible before.
Lee Caswell, VP marketing, CPBU
Trend: Transformation of Traditional DR to RTDR (Real-time Disaster Recovery)
The need to recover from local disasters, including Ransomware, continues to be a major challenge for companies of all sizes. Traditional DR using dedicated data center resources is capital intensive, difficult to test, and hampered by replication of ever-larger data sources. Meanwhile, the number of applications requiring DR is exploding, partly because the number of applications is increasing, but more importantly because new applications must be protected now that they define competitive differentiation, brand loyalty, and customer engagement for today’s digital businesses.
A new real-time DR model will change both DR and backup once hybrid cloud-optimized file systems with built-in security can efficiently replicate local data store deltas to private or public cloud targets so that real-time failover from persistent storage can leverage cloud compute resources provided only on demand. This real-time DR model will transform DR to a real-time recovery process – similar to how VMware vMotion transformed server maintenance from an occasional high-risk, high-cost activity to an automated virtualized runtime.