For Next-Gen HPC, National Energy Research Scientific Computing Center Rolls Out New Community File System
Collaboration with IBM gives users more robust high-capacity storage and management.
This is a Press Release edited by StorageNewsletter.com on April 23, 2020 at 2:23 pmBy Kathy Kincade, science/technology writer, Lawrence Berkeley National Laboratory
Recognizing the evolving data management needs of its diverse user community, the National Energy Research Scientific Computing Center (NERSC) at the U.S. Department of Energy’s (DOE) Lawrence Berkeley National Laboratory unveiled the Community File System (CFS), a long-term storage tier developed in collaboration with IBM Corp. that is optimized for capacity and manageability.
Members of the NERSC Community File System team in front
of the new storage management installation at Shyh Wang Hall:
(left to right) Dan Dumarot, IBM; Juanice Campbell, IBM;
Greg Butler, NERSC; and Kristy Kallback-Rose, NERSC.
Not pictured: Glenn Lockwood, NERSC
The CFS replaces NERSC’s Project File System, a storage mainstay at the center for years that was designed more for performance and I/O than capacity or workflow management. But as high performance computing edges closer to the exascale era, the storage and management landscape is changing, especially in the science community, noted Glenn Lockwood, acting group lead of NERSC’s storage systems group.
In the next few years, the growth in data coming from exascale simulations and next-gen experimental detectors will enable new data-driven science across virtually every domain. At the same time, new nonvolatile storage technologies are entering the market in volume and upending long-held principles used to design the storage hierarchy.
Multi-tiered approach
In 2017, these emerging challenges prompted NERSC and others in the HPC community to publish the Storage 2020 report, which outlines a new, multi-tiered data storage and management approach designed to better accommodate the next generation of HPC and data analytics.
Thus was born the CFS, a disk-based layer between NERSC’s short-term Cori scratch file system and the high performance storage system tape archive that supports the storage of, and access to, large scientific datasets over the course of years.
It is essentially a global file system that facilitates the sharing of data between users, systems, and the outside world, providing a single, seamless user interface that simplifies the management of these ‘long-lived’ datasets. It serves primarily as a capacity tier as part of NERSC’s larger storage ecosystem, storing data that exceeds temporary needs but should be more readily available than the tape archive tier.
Evolution of the NERSC storage hierarchy between 2017,
when the Storage 2020 report was published, and 2025.
With an initial capacity of 60PB (compared to the Project File System’s 11PB, and expanding to 200PB by 2025) and aggregate bandwidth of more than 100GB/s for streaming I/O,“CFS is by far the largest file system NERSC currently has – twice the size of Cori scratch,” said Kristy Kallback-Rose, NERSC, who is co-leading the CFS project with Lockwood and gave a presentation on the new system at the recent Storage Technology Showcase in Albuquerque, NM.
“The strategy is we have this place where users with large data requirements can store longer term data, where users can share that data, and where community data can live,” Lockwood said. “High capacity, accessibility, and availability were the key drivers for CFS.”
Working with IBM to develop and refine the system has been key to its success, Lockwood and Kallback-Rose emphasized. The CFS is based on Spectrum Scale, a high-performance clustered file system designed to manage data at scale and perform archive and analytics in place. IBM Elastic Storage Server is a modern implementation of SDS, combining Spectrum Scale software with IBM Power 8 processor-based I/O-intensive servers and dual-ported storage enclosures. Spectrum Scale is the parallel file system at the heart of IBM ESS. It scales system throughput as it grows while still providing a single namespace. This eliminates data silos, simplifies storage management, and delivers performance within an easily scalable storage architecture.
“IBM is delighted to be partnering with NERSC, our decades-old client, to provide our latest storage solutions that will accelerate their advanced computational research and scientific discoveries,” said Amy Hirst, director, Spectrum Scale and ESS software development. “With this IBM storage infrastructure, NERSC will be able to provide its users with a high performance file system that is highly reliable and will allow for seamless storage growth to meet their future demands.”
End-to-end data integrity
CFS has a number of additional features designed to enhance user interaction with their data, Kallback-Rose noted. For example, every NERSC project has an associated Community directory that offers a ‘snapshot’ capability that gives users a 7-day history of their content and any changes made to that content. In addition, the system offers end-to-end data integrity; this means that, from client to disk, there are checks along the way to ensure that, for instance, any bits that are removed are those the user intended to be removed.
“This speaks to the goal of making the CFS a very reliable place to store data,” Lockwood said. “There are a lot of features to protect against data corruption and data loss due to service disruption. Your data, once it is in the CFS, it will always be there in the way that you left it.”
Another important aspect of the CFS is that, in terms of storage, it is designed to outlive NERSC’s supercomputers.
“When Cori is decommissioned, all the data in it will be removed,” he added. “And when Perlmutter arrives, it will have no data in it. But the CFS, and all the data stored there, will be there throughout these and future transitions.”
The data migration from the Project File System to the CFS, which occurred in early January, took 3 days, and NERSC users are enthusiastic about the new system’s impact on their workflows.
For example, being able to adjust quotas on a per directory basis makes administering shared directory space much easier, according to Heather Maria Kelly, software developer, SLAC, who is involved with the Large Synoptic Survey Telescope (LSST) Dark Energy Science Collaboration. “Previously we had to rely on the honor system to control disk use, and too often users would inadvertently over-use both disk and inode allocations. CFS frees up our limited staff resources,” she said.
In addition, the ability to receive disk allocation awards “made a huge difference in terms of our project’s ability to utilize NERSC for our computational needs,” Kelly said. “The disk space we received will enable our team to perform the simulation and data processing necessary to prepare more fully for LSST’s commissioning and first data release.”
“The Community File System gives our users a much needed boost in data storage and data management capabilities, enabling advanced analytics and AI on large scale online datasets,” said Katie Antypas, division Deputy and data department head, NERSC. “This is a crucial step for meeting the DOE SC’s mission science goals.”
NERSC is a DOE Office of Science user facility.
About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a US Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, it serves more than 7,000 scientists at national laboratories and universities researching a range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, CA. It conducts unclassified scientific research and is managed by the University of California for the US Department of Energy. Learn more about computing sciences at Berkeley Lab.
NERSC mission
It is to accelerate scientific discovery at the DOE Office of Science through high performance computing and data analysis. NERSC is the principal provider of high performance computing services to Office of Science programs – magnetic fusion energy, high energy physics, nuclear physics, basic energy sciences, biological and environmental research, and advanced scientific computing research. Computing is a tool as vital as experimentation and theory in solving the scientific challenges of the twenty-first century. Fundamental to the mission of NERSC is enabling computational science of scale, in which large, interdisciplinary teams of scientists attack fundamental problems in science and engineering that require massive calculations and have broad scientific and economic impacts. Examples of these problems include photosynthesis modeling, global climate modeling, combustion modeling, magnetic fusion, astrophysics, computational biology,.
Read also:
NERSC Tape Archives Make Move to Berkeley Lab’s Shyh Wang Hall
Transfer of 120PB will take two years.
February 20, 2019 | Press Release