AWS re:Invent: AWS Unveils 4 Storage Innovations

At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com company, announced 4 storage innovations that deliver added storage performance, resiliency, and value to customers, including:

Amazon EBS io2 Block Express volumes: Next-gen storage server architecture delivers a SAN built for the cloud, with up to 256,000 IO/s, 4,000MB/s throughput, and 64TB of capacity (a 4x increase across all metrics compared to standard io2 volumes), to meet the performance requirements of the most I/O intensive business critical applications (available in preview).
Amazon EBS Gp3 volumes: Next-gen general purpose SSD volumes for Amazon EBS give customers the flexibility to provision additional IO/s and throughput without needing to add additional storage, while also offering higher baseline performance of 3,000 IO/s and 125MB/s of throughput with the ability to provision up to 16,000 IO/s and 1,000MB/s peak throughput (a 4x increase over Gp2 volumes) at a 20% lower price per gigabyte of storage than existing Gp2 volumes (available).
Amazon S3 Intelligent-Tiering automatic data archiving: Two new tiers (Archive Access and Deep Archive Access) help customers further reduce their storage costs by up to 95% for objects rarely accessed by automatically moving unused objects into archive access tiers (available).
Amazon S3 Replication (multi-destination): Capability gives customers the ability to replicate data to multiple S3 buckets in the same or different AWS Regions, in order to better manage content distribution, compliance, and data-sharing needs across Regions (available).

EBS io2 Block Express volumes deliver SAN built for the cloud
Customers choose io2 volumes (the latest gen-provisioned IO/s volumes) to run their critical, performance-intensive applications like SAP HANA, SQL Server, DB2, MySQL, PostgresSQL, and Oracle databases because it provides 99.999% (five 9s) of durability and 4x more IO/s than general purpose EBS volumes. Some applications require higher IO/s, throughput, or capacity than offered by a single io2 volume. To address the needed performance, customers often stripe multiple io2 volumes together. However, the most demanding applications require more io2 volumes to be striped together than customers want to manage. For these demanding applications, many customers have historically used on-premises SANs (a set of disks accessed over the local network). However, SANs have numerous drawbacks. They are expensive due to high upfront acquisition costs, require complex forecasting to ensure sufficient capacity, are complicated and hard to manage, and consume valuable data center space and networking capacity. When a customer exceeds the capacity of a SAN, they have to buy another SAN, which is expensive and forces customers to pay for unused capacity. Customers told us they wanted the power of a SAN, but in the cloud, which hasn’t existed until now.

EBS Block Express is a new storage architecture that gives customers a SAN built for the cloud. It is designed for the largest, most I/O intensive mission-critical deployments of Oracle, SAP HANA, SQL Server, and SAS Analytics that benefit from high-volume IO/s, high throughput, high durability, high storage capacity, and low latency. With io2 volumes running on Block Express, a single io2 volume can now be provisioned with up to 256,000 IO/s, drive up to 4,000MB/s of throughput, and offer 64TB of capacity – a 4x increase over existing io2 volumes across all parameters. Additionally, with io2 Block Express volumes, customers can achieve consistent sub-millisecond latency for their latency-sensitive applications. Customers can also stripe multiple io2 Block Express volumes together to get even better performance than a single volume can provide. Block Express helps io2 volumes achieve this performance by completely reinventing the underlying EBS hardware, software, and networking stacks. By decoupling the compute from the storage at the hardware layer and rewriting the software to take advantage of this decoupling, EBS Block Express enables new levels of performance and reduces time to innovation. By also rewriting the networking stack to take advantage of the high-performance Scalable Reliable Datagrams (SRD) networking protocol, Block Express reduces latency. These improvements are available to customers with no upfront commitments to use io2 Block Express volumes, and customers can provision and scale capacity without the upfront costs of a SAN.

In coming months, additional SAN features will be added to Block Express volumes.
These include multi-attach with I/O fencing to give customers the ability to safely attach multiple instances to a single volume at the same time, Fast Snapshot Restore, and Elastic Volumes to increase EBS volume size, type, and performance. About io2 volumes powered by Block Express.

EBS Gp3 volumes decouple IO/s from storage capacity, deliver more performance, and priced 20% lower than previous gen volumes
Customers use EBS volumes to support a range of workloads, such as relational and non-relational databases (e.g. SQL Server and Oracle), enterprise applications, containerized applications, big data analytics engines, distributed file systems, virtual desktops, dev/test environments, and media workflows. Gp2 volumes have made it easy and cost effective for customers to meet their IO/s and throughput requirements for many of these workloads, but some applications require more IO/s than a single Gp2 volume can deliver. With Gp2 volumes, performance scales up with storage capacity, so customers can get higher IO/s and throughput for their applications by provisioning a larger storage volume size. However, some applications require higher performance, but do not need higher storage capacity (e.g. databases like MySQL and Cassandra). These customers can end up paying for more storage than they need to get the required IO/s performance. Customers who are running these workloads want to meet their performance needs without having to provision and pay for a larger storage volume.

Next-gen Gp3 volumes give customers the ability to independently provision IO/s and throughput separately from storage capacity. For workloads where the application needs more performance, customers can modify the Gp3 volumes to provision the IO/s and throughput they need, without having to add more storage capacity. Gp3 volumes deliver sustained baseline performance of 3,000 IO/s and 125MB/s with the ability to provision up to 16,000 IO/s and 1,000MB/s peak throughput (a 4x increase over Gp2 volumes). In addition to saving customers money by allowing them to scale IO/s independent of storage, Gp3 volumes are also priced 20% lower perGB than existing Gp2 volumes. Customers can migrate Gp2 volumes to Gp3 volumes using Elastic Volumes, an existing feature of EBS that allows customers to modify the volume type, IO/s, storage capacity, and throughput of their existing EBS volumes without interrupting their Amazon Elastic Compute Cloud (EC2) instances. Customers can also create new Gp3 volumes and scale performance using the AWS Management Console, the AWS Command Line Interface (CLI), or the AWS SDK. Started with Gp3 volumes.

Amazon S3 Intelligent-Tiering adds 2 archive tiers that deliver up to 95% storage cost savings
The S3 Intelligent-Tiering storage class automatically optimizes customers’ storage costs for data with unknown or changing access patterns. It is a cloud storage solution to provide dynamic pricing automatically based on the changing access patterns of individual objects in storage. S3 Intelligent-Tiering has been widely adopted by customers with data sets that have varying access patterns (e.g. data lakes) or unknown storage access patterns (e.g. newly-launched applications). S3 Intelligent-Tiering charges for storage in 2 pricing tiers: one tier for frequent access (for real-time data querying) and a cost-optimized tier for infrequent access (for batch querying). However, many AWS customers have storage that they very rarely access and use S3 Glacier or S3 Glacier Deep Archive to reduce their storage costs for this archived data. Prior to today, customers needed to manually build their own applications to monitor and record access to individual objects, when determining which objects were rarely accessed and needed to be moved to archive. Then, they needed to manually move them.

The addition of Archive Access and Deep Archive Access tiers to S3 Intelligent-Tiering enhances a storage class in the cloud that provides dynamic tiering and pricing. S3 Intelligent-Tiering provides automatic tiering and dynamic pricing across four different access tiers (Frequent, Infrequent, Archive, and Deep Archive). Customers using S3 Intelligent-Tiering can save up to 95% on storage that is automatically moved from Frequent access to Deep Archive for 180 days or more. Once a customer has activated one or both of the archive access tiers, S3 Intelligent-Tiering will automatically move objects that have not been accessed for 90 days to the Archive access tier, and after 180 days to the Deep Archive access tier. S3 Intelligent-Tiering supports features like S3 Inventory to report on the access tier of objects, and S3 Replication to replicate data to any AWS Region. There are no retrieval fees when using the S3 Intelligent-Tiering and no additional tiering fees when objects are moved between access tiers. S3 Intelligent-Tiering with the archive access tiers are available in all AWS Regions. Started with Amazon S3 Intelligent-Tiering.

S3 Replication extends ability to replicate data to multiple destinations within same AWS Region or across different AWS Regions
Customers use S3 Replication to create a replica copy of their data within the same AWS Region or across different AWS Regions for compliance requirements, low-latency performance, or data sharing across accounts. Some customers also need to replicate the actual data to multiple destinations (S3 buckets in the same AWS Region or S3 buckets in multiple Regions) to meet data sovereignty requirements, to support collaboration between geographically-distributed teams, or to maintain the same data sets in multiple AWS Regions for resiliency. To accomplish this today, customers must build their own multi-destination replication service by monitoring S3 events to identify any newly-created objects. They then spread out these events into multiple queues, invoke AWS Lambda functions to copy objects to each destination S3 bucket, track the status of each API call, and aggregate the results. Customers also need to monitor and maintain these systems, which create added expense and operational overhead.

With S3 Replication (multi-destination), customers no longer need to develop their own solutions for duplicating data across multiple AWS Regions. Customers can use S3 Replication to replicate data to multiple buckets within the same AWS Region, across multiple AWS Regions, or a combination of both using the same policy-based, managed solution with events and metrics to monitor their data replication. For example, a customer can now easily replicate data to multiple S3 buckets in different AWS Regions – one for primary storage, one for archiving, and one for DR. Customers can also distribute data sets and updates to all AWS Regions for low-latency performance. With S3 Replication (multi-destination), customers can also specify different storage classes for different destinations to save on storage costs and meet data compliance requirements (e.g. customers can use the S3 Intelligent-Tiering storage class for data in two AWS Regions and have another copy in S3 Glacier Deep Archive for a low-cost replica). S3 Replication (multi-destination) fully supports existing S3 Replication functionality like Replication Time Control to provide predictable replication time backed by a Service Level Agreement to meet their compliance or business requirements. Customers can also monitor the status of their replication with Amazon CloudWatch metrics, events, and the object-level replication status field. S3 Replication (multi-destination) can be configured using the S3 management console, AWS CloudFormation, or via the AWS CLI or AWS SDK. Started with S3 Replication (multi-destination).

“More data will be created in the next 3 years than was created over the past 30 years,” said Mai-Lan Tomsen Bukovec, VP, storage, AWS. “The cloud is a big part of why developers and companies are generating and retaining so much data, and storage is in need of reinvention. This announcements reinvent storage by building a new SAN for the cloud, automatically tiering customers’ vast troves of data so they can save money on what’s not being accessed often, and making it simple to replicate data and move it around the world as needed to enable customers to manage this new normal more effectively.“

Teradata Corp. is the cloud data analytics platform company, solving the world’s most complex data challenges at scale.

“When your focus is analyzing the world’s data in real-time, having the right balance of price and performance is critical-for our business and our end-customers,” said Dan Spurling, SVP, engineering, Teradata. “With the release of Gp3, Teradata AWS customers will experience improved performance and throughput, allowing them to drive increased analytics at scale. With the significant improvements in Gp3 over Gp2, we expect 4x higher throughput with fewer EBS volumes per instance; enabling our customers to receive increased performance and improved instance level availability.“

Embark is building self-driving truck technology to make roads safer and transportation more efficient.

“We use S3 Intelligent-Tiering to store logs from our fleet of self-driving trucks. These logs contain petabytes of data from the sensors on our vehicle such as cameras and LiDARS, while also storing control signals, and system logs that we need to perfectly reconstruct everything that happened on and around our vehicles at any point in time. Keeping all of this data is critical to our business,” said Paul Ashbourne, head, infrastructure, Embark. “Our teams frequently access recently-collected data for their analyses, but over time, most of this data gets colder and subsets of data are not accessed for months at a time. It’s important that we continue to save older data in case we need to analyze it again, but doing so can be costly. S3 Intelligent-Tiering is perfect for us because it automatically optimizes our storage costs based on individual object access patterns. With the two new Archive Access tiers, we save even more when our fleet log data is rarely accessed. All of this happens seamlessly without our engineering team having to build or manage any custom storage monitoring systems. With S3 Intelligent-Tiering, everything just works, and we can focus less time managing our storage and more time on R&D.“

Zalando SE is an Europe’s online platform for fashion and lifestyle with over 35 million active customers.

“We built a 15PB data lake on Amazon S3 which has allowed employees to act on and analyze historical sales and web tracking data that they previously wouldn’t have had access to,” said Max Schultze, lead data engineer, Zalando. “By using S3 Intelligent-Tiering, we were able to save 37% of our yearly storage costs for our data lake, as it automatically moved objects between a frequent access and infrequent access tier. We are looking forward to the new S3 Intelligent-Tiering Archive Access tiers to save even more on objects that are not accessed for long periods of time.“

SmugMug+Flickr is an influential photographer-centric platform.

“We’re new to S3 Replication and have used Amazon S3 since day one. S3 Replication for multiple destinations delivers awesome options for our global data handling,” said Andrew Shieh, director, engineering/operations, SmugMug+Flickr. “We can now leverage S3 Replication in new ways, planning optimized replication strategies using our existing S3 Object tags. S3 Replication support for multiple destinations handles the heavy lifting, so we can spend more of our operational and development time thrilling our customers. Our petabytes of data in S3, fully managed by a few lines of code, are at the heart of our business. These continual improvements to S3 keep SmugMug and Flickr growing together with our partners at AWS.“

Read also:
Major Updates for AWS Storage Services
Covering all aspects of storage offering
November 23, 2020 | News | by Philippe Nicolas
AWS S3 Intelligent-Tiering Adds Archive Access Tiers
Storage class to optimize storage costs by automatically moving data to most cost-effective access tier without performance impact or operational overhead
November 12, 2020 | Press Release