Speaker
Andy Gotz
(ESRF)
Description
by Alex de Maria, Armando Solé, and Andy Gotz on behalf of the ESRF Data Policy Implementation Team
The ESRF, the European Synchroton, has recently adopted a Data Policy which will archive all data collected at the ESRF for 10 years and be made freely available as Open Data after an initial embargo period of 3 years (can be extended on request). Currently the ESRF produces 2 PBs of raw data annually. This means archiving at least 70 PBs of data over the next 10 years if one assumes a linear growth of data production. The Data Policy introduces a number of new challenges for the ESRF. These challenges include persistent user identities, user rights, metadata definition and standardisation, automated collecting of metadata, metadata catalogue, data containers, long term archiving, and finding and re-using data. This paper will describe how these challenges are being solved. The paper describes how it is possible for a mature synchrotron to adopt and implement a modern Data Policy largely built on existing standards like the ICAT metadata catalogue (icatproject.org) and the HDF5/Nexus data format/convention. Archiving such large quantities of data is largely due to the availability of off-the-shelf tape technology which continues to evolve and improve.
Primary author
Andy Gotz
(ESRF)