Second edition of the Storage Technology Showcase (STS), this 2020 event on March 2-5, organized in Albuquerque, NM, was centered around the topic Easing the friction of data movement.
Key topics were HPC file storage, data migration across tiers but also for technology refresh, tape in various flavors, and of course archiving.
The conference had around 100 attendees with users from NERSC*, LANL*, NOAA CLASS*, St Jude Children Research Hospital, Sandia National Laboratory, ECMWF*, Hudson Alpha Institute for Biotechnology and Navy DSRC* delivering interesting and deep presentations. It was also the opportunity to meet and listen to sponsors pitches such Spectra Logic, IBM, Western Digital, HPE, HPSS, DDN, StrongBox Data Solutions, Qumulo, NetApp, Versity, Globus, Red Hat, Liqid, Starfish, SoftIron and Abba Technologies and 2 IT services organizations Alliance Technology Group and DST. Brad Johns Consulting gave an update on its tape TCO tool and for the first time, Coldago Research unveiled the 2020 end-users survey run last January.
Beyond vendors pitches, users presented their challenges in large configuration in multi-tier data and storage environments. Without any surprise Lustre and IBM GPFS/Spectrum Scale are largely used in these various sites.
- NERSC presented its data migration projects related to IBM Spectrum Scale and gave details on its massive HPSS configurations with 15,000 tapes and 3 IBM TS4500 libraries for more than 200PB.
- St Jude Children Hospital explained their data migration project covering DDN GRIDscaler with the help of Atempo Miria.
- Sandia National Laboratory took an other angle explaining their choice to develop their own data migration tools.
- At ECMWF, the story was about the migration of massive volume of data from T10k to 3591 JE. The total archive volume is 451PB with movement of 240TB/day for archives and 200TB/day for restores.
- NOAA CLASS covered its data migration project from LTO-6 to LTO-8 used in 2 Spectra Logic TFinity tape libraries.
Spectra Logic did a good job pitching software ideas illustrating StorCycle, their intelligent tiering software, and Blackpearl product family around object storage. We understood that the company prepares some iterations in software to manage data at hyperscale level.
DDN pitch was promoting Exa5, confirming their shift to Lustre. Beyond NFS and SMB launched early 2019, the content shows clearly a new access method with S3 for the new coming generation of Exa product.
Western Digital with a talk titled Tape to the Future reminded us that the company is a key player in the tape industry. It was a surprise to see the company pitching secondary storage ideas as they recently gave up with the sale of IntelliFlash to DDN and ActiveScale to Quantum. The new eldorado for tape are the hyperscaler looking for a drastic reduction of their TCO for long-term retention data, it was already mentioned during the Fujifilm Summit a few months ago.
For organizations with massive data volume still growing at a rapid pace, aligning cost of storage with the value of data is paramount. This is not new, we all remember the ILM wave more than a decade ago. It means that a real multi-tiers strategy, from SSD and HDD to tape to cloud, the two being sometimes in reverse position, can’t be replaced. With various presentations, we understand that tape is still used in very large sites such university and research centers and also as we said above at Google, AWS, Azure or Facebook. In other words when data volume are big, really big, there is no real alternatives at that TCO level. At the end of the day, for demanding environments, on-premises configurations consider SSD to tape to cloud and in the cloud, hyperscaler aka cloud providers and Internet giants do flash to HDD to tape.
Obviously IBM presented Spectrum Scale and its companion Elastic Storage Server with the NVMe specific development and the Erasure Code Edition.
HPE continues to go in all directions with the promotion of ClusterStor coming from the recent Cray acquisition in addition to tactical pushes with DDN, IBM Spectrum Scale, Panasas and WekaIO. it confirms that HPE priority is to sell hardware whatever the storage software layer is. DMF was also covered as a key element in data management for HPC sites supporting tape libraries, zero-watt storage, object and cloud storage like AWS.
Beyond classic HPC file systems like Lustre, IBM Spectrum Scale, even BeeGFS and recent comers like WekaIO, NAS started to be considered for such file storage needs. Qumulo and VAST Data are listed in the IO500 rankings with specific internal design aligned with flash in mind since the origin. This is clearly different from a pretty old file system with active effort and iterations to optimize it to leverage flash.
NetApp pitched Data Fabric, Keystone, AFF and StorageGRID and its cloud strategy with AWS, GCP and Azure.
Data management was well covered with HPE promoting DMF, StrongBox with StrongLink, DDN with DataFlow, IBM with Spectrum Discover and HPSS, Spectra Logic with StorCycle and finally Versity Software.
Versity spent some time to detail its archiving philosophy with its Storage Manager v2 product based on its new Scout file system and archiving engine aka ScoutAM. The company is referenced a very large data sites.
Beyond archiving software, StrongBox Data Solutions introduced the need for a comprehensive list of mandatory operations to manage complex large multi-tiers data environments. The presentation insisted on the pivotal role of metadata with 4 categories: file system, rich application, external and user-created metadata but also on the convergence of storage resource management and data management techniques fueled by intelligent policy engines coupled with data movers.
We also saw a few times some slides from DDN, Qumulo and end-users like St Jude hospital referencing Atempo Miria or DDN DataFlow to control data migration here refresh and replacement of file servers.
This conference centered to HPC users data challenges and storage needs confirms the role of parallel file system, DDN as a key player in HPC, tape with Spectra Logic and IBM coupled with HPSS or alternatives like DMF and Versity and the appearance of NFS for some use cases.
* NERSC = National Energy Research Scientific Computing Center
LANL = Los Alamos National Laboratory
NOAA CLASS = National Oceanic and Atmospheric Administration Comprehensive Large Array-data Stewardship System
DSRC = DoD Supercomputing Resource Center