Here are large abstracts of a white paper written this month by David Reinsel, John Gantz and John Rydning, analysts at IDC Corp. and sponsored by Seagate Technology LLC. This document is available for free without the need to register.
Data Age 2025
The Digitization of the World
From Edge to Core
Global Datasphere expansion is never-ending
IDC has defined three primary locations where digitization is happening and where digital content is created:
- core (traditional and cloud datacenters),
- the edge (enterprise-hardened infrastructure like cell towers and branch offices), and
- the endpoints (PCs, smart phones, and IoT devices).
The summation of all this data, whether it is created, captured, or replicated, is called the Global Datasphere, and it is experiencing tremendous growth. IDC predicts that the Global Datasphere will grow from 33ZB) in 2018 to 175ZB by 2025.
To keep up with the storage demands stemming from all this data creation, IDC forecasts that over 22ZB of storage capacity must ship across all media types from 2018 to 2025, with nearly 59% of that capacity supplied from the HDD industry.
An enterprise renaissance is on the horizon
The enterprise is fast becoming the world’s data steward … again. In the recent past, consumers were responsible for much of their own data, but their reliance on and trust of today’s cloud services, especially from connectivity, performance, and convenience perspectives, continues to increase while the need to store and manage data locally continues to decrease. Moreover, businesses are looking to centralize data management and delivery (e.g., online video streaming, data analytics, data security, and privacy) as well as to leverage data to control their businesses and the user experience (e.g., machine-to-machine communication, IoT, persistent personalization profiling). The responsibility to maintain and manage all this consumer and business data supports the growth in cloud provider datacenters. As a result, the enterprise’s role as a data steward continues to grow, and consumers are not just allowing this, but expecting it. Beginning in 2019, more data will be stored in the enterprise core than in all the world’s existing endpoints.
Cloud is the new core
One of the key drivers of growth in the core is the shift to the cloud from traditional datacenters. As companies continue to pursue the cloud (both public and private) for data processing needs, cloud datacenters are becoming the new enterprise data repository. In essence, the cloud is becoming the new core. In 2025 IDC predicts that 49% of the world’s stored data will reside in public cloud environments.
Introducing world’s first data readiness condition (DATCON) index
Not all industries are prepared for their digitally transformed future. So, to help companies understand their level of data readiness, IDC developed a DATCON (DATa readiness CONdition) index, designed to analyze various industries regarding their own Datasphere, level of data management, usage, leadership, and monetization capabilities.
It examined four industries as part of its DATCON analysis: financial services, manufacturing, healthcare, and media and entertainment.
Manufacturing’s Datasphere is by far the largest given its maturity, investment in IoT, and 24×7 operations, manufacturing and financial services are the leading industries in terms of maturity, with media and entertainment most in need of a jump start.
China’s Datasphere on pace to becoming the largest in the world
Every geographic region has its own Datasphere size and trajectories that are impacted by population, digital transformation progress, IT spend and maturity, and many other metrics. For example, China’s Datasphere is expected to grow 30% on average over the next 7 years and will be the largest Datasphere of all regions by 2025 (compared to EMEA, APJxC, U.S., and Rest of World) as its connected population grows and its video surveillance infrastructure proliferates. (APJxC includes AsiaPac countries, including Japan, but not China.)
Consumers are addicted to data, and more of it in real-time
Today, more than 5 billion consumers interact with data every day – by 2025, that number will be 6 billion, or 75% of the world’s population. In 2025, each connected person will have at least one data interaction every 18s. Many of these interactions are because of the billions of IoT devices connected across the globe, which are expected to create over 90ZB of data in 2025.
Manufacturers have long sensed and actuated on real-time data feeds within controlled manufacturing environments. This has led to better quality products at lower prices. Companies are now looking to sense and actuate on data collected outside the factory walls, while products are being used. Manufacturers can extend product life and reduce product failures by understanding product performance in random environments used by customers with a multitude of behavior profiles. This is now possible with sensors that are embedded and connected in the everyday products that we use.
But the heart of the Datasphere is the core. The core plays a critical role by providing centralized storage and archiving, service delivery, deeper-level analytics, command and control, and regulatory compliance. As a result, data flows in a constant stream from endpoints and the edge to the core and back out to the edge and endpoints, with each location playing an important part in the overall Datasphere. This propagation of data drives data growth in the core and has ramifications for analytics and intelligence throughout the network, powering internal and external processes, as well as intelligent and predictive engagements between businesses and individuals across entire ecosystems (Figure 2). The net effect is the continued importance and growth in enterprise storage.
Data Created (Datasphere) is different than data stored
From a data creation perspective (solid lines in Figure 3), endpoints are declining as a percent while the core and edge continue to produce more. From a storage perspective (dotted lines in Figure 3), the amount of data being stored in endpoints will plummet as the core becomes the repository of choice for data of all types. By 2024, IDC expects data stored in the core to be more than double the data stored in the endpoint, completely reversing the dynamic from 2015. Edge storage will also see significant growth as latency-sensitive services and applications proliferate throughout our world.
Cloud is the New Core … and Much of it is Additive
Today, as greater numbers of devices with greater levels of intelligence are connected to various networks, businesses and consumers are finding the cloud to be an increasingly attractive option that enables fast, ubiquitous access to their data. Increasingly, consumers are fine with lower storage capacity on endpoint devices in favor of using the cloud. By 2020, more bytes will be stored in the public cloud than in consumer devices (Figure 4), and by 2021, there will be more data stored in the public cloud than in traditional datacenters (Figure 5).
The enterprise Datasphere and stewardship is vital to our future
The enterprise continues to see its share of Datasphere stewardship grow, with consumers’ share of data generated dropping from 47% in 2017 to 36% by 2025. This shift is largely driven by the increasingly always-on and ‘sensorized’ world that is capturing and analyzing our environments and creating data 24×7.
In the past, consumers were responsible for much of their own data, but as data becomes increasingly centralized across enterprise core and edge infrastructure, the responsibility to maintain and manage it is shifting to enterprise/cloud provider datacenters. The enterprise is already the primary source and steward of data creation and storage, and the trend continues to amplify these responsibilities (Figures 6 and 7).
Figure 6 and 7 – The enterprise Datasphere continues to expand
Real-Time Data Demand is Driving the Edge
IDC forecasts that more than 150 billion devices will be connected across the globe by 2025, most of which will be creating data in real time. For example, automated machines on a manufacturing floor rely on real-time data for process control and improvement. Real-time data represents 15% of the Datasphere in 2017, and nearly 30% by 2025 (Figure 8).
Figure 8 – Real-time data
But it’s not just machines that are driving real-time data. By 2025, every connected person in the world on average will have a digital data engagement over 4,900 times per day – that’s about one digital interaction every 18s (Figure 9).
Figure 9 – Data interactions per connected person per day
The demand for storage remains strong
The amount of data created in the Global Datasphere is, of course, the target for the storage industry. Even with the amount of data created that is discarded, overwritten, or sensed, but never stored longer than milliseconds, there still exists a growing demand for storage capacity across industries, governments, enterprises, and consumers.
To live in a digitized world where artificial intelligence drives business processes, customer engagements, and autonomous infrastructure or where consumers’ lives are hyper-personalized in nearly every aspect of behavior – including what time we’ll be awakened based on the previous day’s activities, overnight sleep patterns, and the next day’s calendar – will require creating and storing more data than ever before. IDC currently calculates Data Age 2025 storage capacity shipments across all media types (HDD, SSD, NVM-flash/other, tape, and optical) over the next 4 years (2018-2021) will need to exceed the 6.9ZB shipped across all media types over the past 20 years.
IDC forecasts that over 22ZB of storage capacity must ship across all media types from 2018 to 2025 to keep up with storage demands. Around 59% of the capacity will need to come from the HDD industry and 26% from flash technology over that same time frame, with optical storage the only medium to show signs of fatigue as consumers continue to abandon DVDs in favor of streaming video and audio (Figure 10).
The growth in endpoint and edge storage will favor solid state, while the core continues to have a voracious appetite for the economical bytes that HDD drives and tape provide. Enterprises will use a mix of disk drives, SSDs, flash, and tape to satisfy the performance, management, and archive demands being placed on them. By the end of 2025, over 80% of the enterprise bytes shipped into the core and edge will continue to be HDD bytes when compared to SSDs and other NVM technologies (Figure 11).
Figure 11 – Share WW byte shipments into the enterprise Core and Edge by storage media type
Figure 12 – Size and growth of the global Datasphere by region
Figure 13 – Global datasphere share by region
Cloud growth explodes outside the United States
As the HQs region for the leading global cloud providers, the United States has traditionally had the lion’s share of cloud storage, followed by EMEA and APJxC. And while cloud storage in the United States will continue to grow, cloud storage in other regions will grow even faster,
fueled both by the desire to reduce latency by locating data closer to the end consumer, as well as corporate and regulatory mandates requiring data to be housed locally within different regions. The U.S. share of public cloud storage will drop precipitously from 51% in 2017 to 31% by 2025, while China’s share will more than double from 6% to 13% (Figure 14)
Figure 14 – Cloud storage growth and share by region
Figure 18 – Comparing industry Datasphere growth rates
Conclusion: data is changing the world
As consumers, data is helping us build more and deeper connections, and to access products and services more quickly and easily, at the time and place of our choosing. We can now walk into a store and walk out with our purchases, leaving our transaction record (and perhaps facial image) as a digital trail, but never having to pull out a credit card or cash.
As businesses, data is helping us reach new markets, better serve existing customers, streamline operations, and monetize raw and analyzed data. If reported global intangible assets of companies are more than $200 trillion dollars, what must – and will – the value of unreported data assets be? Data is an intangible asset and underpins most other intangible assets like patents and goodwill. Bytes can be made more valuable by surrounding them with security, leveraging them in AI, or using them to cure diseases.
Nevertheless, there is a cost associated with data: purchasing, maintaining, and protecting storage, as well as the cost of losing data or having sensitive data fall into the hands of a competitor or hacker. The real value of data is out there, and companies are just finding out that data has real worth. Those businesses first through the gateway of digital transformation will be the first to find out just how valuable their data is.
The Global Datasphere is large and complex, with key interdependencies between core, edge, and endpoints. While the edge and endpoints will continue to play a critical role as the place where the Datasphere meets the physical world, the core remains the heart of the Datasphere gathering data from the edges and endpoints, processing and archiving it, and promulgating it back for consumption by end users, including machines and things – and the cloud is a vital part of this core.
Companies looking to be relevant between now and 2025 will need to understand the role data plays in their organization and how the Datasphere will evolve during that period. They will need to embrace their role as data guardians, leverage the cloud, and take a global approach to their data. Different industries have different levels of data maturity, so companies should review the IDC DATCON index reports to learn where they stand relative to their industry index and what they need to do to not just survive – but more importantly to thrive – in their own Datasphere.