Max Planck Institute for Radio Astronomy Using Grau Data Archivemanager

With Spectra Logic LTO-5 libraries, for long term archiving
This is a Press Release edited by on 2013.01.21

Whenever a research institute such as the Max Planck Institute scours the depths of the galaxy for new knowledge, huge amounts of data are generated, often collected over many years.


The research group for fundamental physics in radio astronomy at the Max Planck Institute is investigating radio waves in cosmic rays and examining pulsars in order to study the magnetic forces of the Milky Way. The observations permit tests on the theory of general relativity and alternative theories of gravitation.

The data comes from the Effelsberg radio telescope, which generates over 100GB of data in just 30 minutes during one measurement. Some 18TB of measurement data are stored for calculation and analysis each month. The analysis of the data takes considerably more time. The researchers are dependent on having the data stored for many years and being able to access it unhindered at any time. The institute has found a way to store these large quantities of data using Grau Data AG's Archivemanager, an active archive solution capable of managing several petabytes of data.

The Max Planck Institute is in fundamental physics in radio astronomy, and the amounts of data measured and analyzed by the employees of the research groups are enormous. And while the institute is required by law to retain its data for ten years, the research data needs to be held much longer. New algorithms are constantly being developed to also integrate old data into calculations. Storing all of the data collected using the radio telescope on hard drives, i.e. using online storage, would go far beyond the boundaries of the institute's budget. In addition, the data is not used constantly and will often remain inactive for long periods of time on the storage unit.

The solution was an active archiving concept based on Grau Data's archiving software that made use of LTO magnetic tape as a long-term archiving medium.

Testing, Modification, and Production System

In August 2011, the Max Planck Institute began the project using the active archiving solution called Archivemanager in conjunction with Grau Data. The first step was to port the software to the Debian/GNU Linux OS at the request of the Max Planck Institute. The tests had been successfully completed in October and the complete solution began productive operation in November.

The astronomical measurement data from the Effelsberg radio telescope was first buffered in an 8Gb FC SAN on a 120TB online storage sysstem. Fujitsu Primergy RX 300 S6 systems were provided as servers, and these were used to store the data redundantly on the Spectra Logic Corp.'s LTO 5-tape libraries in Effelsberg and Bonn with the help of Archivemanager.

The archiving software now manages around 350 tapes per library, with each tape capable of holding around 1.5TB. The amount of data stored is growing rapidly. In total, the data stored up to May 2012 has grown to 525TB, with the total system currently expandable to 3.5PB.

"Unlike traditional corporate archiving systems, the tape technology in our division of the Max Planck Institute is often used as an expanded online storage medium that researchers access at regular intervals," said Jan Behrend, IT specialist at the Max Planck Institute, as he explained the structure of the storage solution. "The tape libraries, together with Archivemanager, are fast enough on a 1Gbit/s network to provide the research groups with their enormous amounts of data. At the same time, the storage system provides us with enormous cost advantages in comparison to traditional disk-based online storage."

The hardware-independent Archivemanager, combined with the Fujitsu servers, is capable of migrating large quantities of data quickly to the tape libraries. The rate of data input into the active archiving system is around a gigabit per second. The read/write speed can reach up to 130MB/s per drive, which corresponds to around 500GB an hour.

Management of Large Amounts of Data

Archivemanager provides the IT team of the institute with an intuitive system of administration. Fill levels and transfer rates are constantly monitored by the software. Should manual intervention become necessary, the administrator will receive a message. Even the daily backup of the metadata to the remote location performs automatically.

Due to the smooth operation of the active archiving software, the institute decided to make use of the solution's multi-client capability and include two other research groups in the overall system. The multi-client capability allows separate partitions to be created, which ensured that the data and the usage of the drives and tapes could be separated.

Long-term Open Source Strategy

In addition to functionality, one of the most critical reasons behind the decision to use Grau Data archiving software was that it was ported to the Debian Linux OS and that an open source alternative with practically identical functionality was available. The German institute, much like the majority of research institutes around the globe, makes use of Linux-based OS. With OpenArchive, Grau Data is offering a professional, Linux-based, open source archiving software.

"Archivemanager was the best product for us during the first phase in order to ensure stable, high-performance operation. In the long term, we may move to the open source alternative from Grau Data. This idea takes into consideration not only that we may save costs for licenses, but also that we as a research institute often design and develop our applications. These applications can be integrated much more easily into a consistent open source environment thanks to the open-sourced code of the archiving software. At the moment, Archivemanager suits our needs superbly. The software runs absolutely reliably and there is nothing more that could be asked of an administration system," said Behrend, commenting on the project.

