Open Source Object Storage Software by Seagate
Named Cortx coupled with reference architectures
This is a Press Release edited by StorageNewsletter.com on September 28, 2020 at 2:17 pmSeagate Technology plc introduced an open-source object storage software, a reference architecture powered by it, and a corresponding developer community.
All 3 were built to manage the massive surge and sprawl of unstructured enterprise data. This announcement was part of the company’s first annual Datasphere event.
“We live in a data economy,” said CEO Dave Mosley. “The value of enterprise data is too often untapped. Businesses struggle to access their data’s full potential. Seagate tailored its offerings to match the new information-hungry reality. The cost-effective, frictionless, and reliable data management innovations that Seagate unveiled today will help companies get more value out of their data.”
Solutions announced today include the 100% open source-based software CORTX: the collaborative open source CORTX Community; and the open, flexible reference architecture deployed as converged infrastructure Lyve Drive Rack, powered by CORTX.
CORTX Software
CORTX is hardware-agnostic open-source object storage software that gives developers and partners access to mass capacity-optimized data storage architectures. It use cases include AI, ML, hybrid cloud, the edge and HPC. Given customers’ preference for freedom from vendor lock-in, it is open source-based and developed with the community. Several early adopters began testing the software and participating in the CORTX Community ahead of the launch.
Scientific communities with mass-scale data storage requirements cheered CORTX’s arrival.
An early adopter, the French Alternative Energies and Atomic Agency (CEA), has been testing a development version of CORTX for several years.
The agency concluded that it is “now proving to be very powerful and flexible object storage, which can be used very effectively to implement very large-scale data storage,” in the words of Jacques-Charles Lafoucriere, program manager, CEA. “CORTX can very nicely work with storage tools and many different types of storage interfaces. We have effectively used CORTX to implement a parallel file system interface (pNFS) and hierarchical storage management tools. CORTX architecture is also compatible with artificial intelligence and deep learning (AI/DL) tools such as TensorFlow.”
Another early adopter, the UK Atomic Energy Authority (UKAEA), in fusion energy research and development, sees CORTX as a welcome and needed solution.
“CORTX is novel in its very concept,” said Dr. Debasmita Samadder, exascale algorithms specialist, UKAEA. “It is very exciting to try our application and explore its performance using this unique object data storage system.”
“As HPC division leader at Los Alamos National Lab, I am vigilant for opportunities to reduce the cost and complexity of our distributed data platforms,” said Gary Grider. “I am very excited to see what Seagate is doing with CORTX and am optimistic about its ability to lower costs for data storage at the exabyte scale. We will be closely following the open source CORTX and will participate in the community built around it, because we share Seagate’s goal of economically efficient storage optimized for massive scalability and durability.”
Early adopters of CORTX also include Toyota Motor Corporation and Fujitsu Limited.
CORTX Community
It is a group of open source researchers and developers working together to enable mass capacity object storage for the world’s proliferating data sets.
CORTX is now available for download and collaboration on GitHub, Inc.
“Seagate delivers an open platform, with all the feature sets and roadmaps driven by the community – for the community,” said Jeff McAffer, senior director of product, GitHub. “It’s the kind of setting in which innovation happens.”
While CORTX and CORTX Community are Seagate’s latest contributions to object storage, the company has long played a role in its collaborative development. In the late 1990s, Seagate was a pioneering member of the industry consortium that created the first object storage specification: the SNIA OSD standard. Firm’s commitment to innovation and collaboration in object storage continues in CORTX and its many architectural optimizations.
Both offerings drew praise from Intel Corp. and WekaIO, Inc.
“Open source innovation in high-performance storage is critical to propel cloud, HPC, AI and communications networks to higher levels of performance in the coming data era,” said Bryan Jorgensen, VP, Intel data platforms group. “Intel plans to work within the CORTX Community to enable and optimize this exciting open source technology with our relevant platform features, including Intel Optane persistent memory, Intel QuickAssist accelerators, and the DAOS file system. We will also be working with Seagate to integrate those same technology innovations within the mass capacity-optimized Lyve Drive Rack reference design.”
Shailesh Manjrekar, head of AI and strategic alliances, WekaIO, weighed in as well: “As the provider of the world’s fastest file system, we are thrilled to partner with Seagate to meet our customer’s demands for high performance and exascale economic storage for use cases like AI/ML, life sciences, and financial services. We appreciate Seagate’s proven data storage expertise and look forward to participating in the CORTX open source development to create end-to-end solutions leveraging our transformative Weka AI solutions framework, where WekaFS provides the extreme performance and CORTX provides capacity and durability.”
Lyve Drive Rack
It is an open, flexible converged storage infrastructure that provides users with a ready-made reference architecture with which to deploy CORTX and build their own mass capacity-optimized private storage cloud. The solution democratizes hyperscale storage architectures. It offers economical and fast deployment of object storage, enabling discovery of valuable insights through rich data labeling of massive amounts of data. The enclosure’s capacities start at 1.34PB.
The Datasphere event featured a demo for Lyve Drive Rack. It was furnished with Seagate’s next-gen hardware innovation, the 20TB HAMR hard drives, showing that CORTX and Lyve Drive Rack enable fast adoption of mass-capacity drives for hyperscale applications. Shipments of Lyve Drive Rack and the 20TB HAMR drives are scheduled to begin in December.
Another early adopter of CORTX and Lyve Drive Rack, DC BLOX, provides resilient edge-connected colocation, networking, and storage infrastructure.
“DC BLOX values Seagate’s leadership in tackling the rapidly increasing challenge of large-scale data storage and management with its CORTX object storage system,” said Peyton McNully, chief cloud architect, DC BLOX.
Public cloud hyperscale storage infrastructures rely on the cost efficiency of mass-capacity devices to reduce the cost of storage. With this announcements, Seagate is bringing that same capability and economic benefit to the enterprise in an open architecture mode – the open-source data management software coupled with a multi-vendor reference architecture ecosystem.
Datasphere Event
The virtual Datasphere event also included 2 panel discussions centered around tapping more enterprise data and open source solutions. The panels featured leaders from Seagate, ServiceNow, RISC-V International, Equinix, GitHub, AT&T, and IDC. Other Seagate and experts also led deeper dives into the new technologies and use cases.
Comments
This object storage announcement made by Seagate with CORTX is a surprise for the market even if the company tried some initiatives a few times in the past.
Beyond Seagate milestones, illustrated by the image below, it’s important to mention some agreements and partnerships with Cloudian, Scality, Ceph, Swift and MinIO among others.
The data deluge represents an opportunity already addressed by plenty of vendors, historically by object storage players and with different iterations by other vendors who had an object interface to their storage solution or add an object storage engine. For Seagate it’s about selling more drives and systems but obviously doing things by itself gives more control and market penetration with a better ratio between object storage projects vs. disk enclosures sold.
HDD competitors WDC and Seagate, very often follow each other or do pretty similar things, they have decided to start some systems and platforms strategies. HGST, a subsidiary of WDC, acquired Amplidata in 2015 wishing to gain some portions of the cake. And finally with a 180% turn, WDC decided to quit this effort selling ActiveScale to Quantum. This exit was a surprise and the sale as well as the market thought that Quantum could have acquired Amplidata being an OEM with the Lattus product line before WDC’s move. Quantum has to accelerate product development as WDC has frozen the product. This later also sold the Tegile product line to DDN. But on this topic, Seagate continues its effort and finally unveils its own solution stack named CORTX for the software and Lyve Drive Rack as a reference architecture.
Object storage today is promoted by plenty of vendors coming from multiple horizons. The image below illustrates different initiators like pure players, infrastructure, server or other storage vendors, containers actors and even cloud providers. Object storage segment is a crowded segment, no doubt as the need is ubiquitous.
Clearly, this announcement shakes the object storage landscape and will have an impact on commercial players as the early testers testify.
CORTX is an open source object storage software under Apache v2.0 license for the core and AGPL v3 for peripherals. This flavor confirms the desire for Seagate to handle carefully its relationship with its legacy clients for disks and arrays and reaffirms as well its status of hardware vendor wishing to sell materials, not software. In other words, hardware sales is what counts for the company and to make it possible, they offer the software at zero cost due to its open source nature. The source code is available on Github. But CORTX is also the start of a community to continue the effort in that open source direction wishing to not let the 2 community gorillas - Ceph and MinIO - that part in that domain drive.
The firm claimed that CORTX represents the second wave of object storage, but we already saw 4 technology iterations with many elements like OSD, CAS, pure object storage software, key-value store, intelligent drives...
We learned that some users test CORTX for several years meaning that Seagate started this development during active partnerships. Promoted by various communities with strong demands for very very large storage space and object size, CORTX is designed with these goals in mind.
The image below shows the various elements of the solution stack. S3 is of course exposed, NFS will be offered with a future CORTX release and we understand it will be based on Ganesha.
Seagate chooses to disaggregate compute and storage. In other words, it combines a scalable access layer with CORTX software deployed on it exposing access methods and a storage layer. These 2 layers can scale independently and we can imagine a capacity model with a limited number of access points, the reverse with a large number of access points and limited storage and of course a combination of the 2. So by nature the philosophy is scale-out and implements an auto indexed key-value store that offers easy search and labelling, 2 key features at scale. The data placement technique and how servers are organized and glued together are not mentioned in the literature of the product.
There is no file system created on disk, they’re used in raw mode making space management more efficient like Caringo and DDN WOS.
For data protection, CORTX obviously provides erasure coding (EC) techniques as at scale replication doesn’t deliver good enough TCO and is limited for large object and large capacity. Two layers are combined, the first purely software operated by CORTX as 7+1 on top of a hardware-based one embedded in each Seagate enclosure with a 8+2 scheme. It immediately converts to 70% of efficiency and +43% of hardware overhead. Considering disk enclosure makes things less granular with 2 groups of 53 disks arranged with 8+2 EC sets. This is radically different from a model with independent disks controlled from the storage server layer. We don't know yet the chunk size used for the 2 EC services.
This launch includes a reference architecture (RA) named Lyve Drive Rack that will evolve incrementally. These RAs include Seagate’s disk enclosures, the first iteration named R1, supports 5U84 and 4U106 models and 2 server nodes. With R1, CORTX's first layer of erasure coding with a 7+1 scheme is obviously not possible and only data protection within boxes is offered based on Seagate ADAPT. R2 will be with 3 nodes plus 3 enclosures with any to any connections. This announcement was also the opportunity to cover HAMR 20TB HDD and multi-actuator disks.
CORTX confirms the convergence between file and object storage both dedicated to unstructured data. We’ll see if the content will be simultaneously accessible from S3 and NFS without the need to duplicate data and therefore create a potential data divergence.
We expect Seagate will also address a clear need to reduce the cost of massive storage infrastructure with the 3 key functions: data reduction, erasure coding and energy savings. With CORTX, the company checks the EC box but users need the 2 other functionalities which combined together drastically reduce TCO for archived data. If multiple layers of CORTX clusters could be built to offer these triple features, it would be interesting....
Note that historically, Seagate never succeeds to diversify out of core business HDD.