You are here: Home / Services / Data Management

Data Management

ESGF and WDCC: The complementary pillars of the Digital Repository at DKRZ

Digital repositories in climate science as well as other scientific domains are faced with new challenges in multiple areas during the last years:
Scientists are increasingly conducting research in global networks. At the same time, storage solutions have more and more developed into globally distributed, open systems.
In order to ensure the transparency of research and the re-use of data in larger research fields, funders, data providers and users, as well as the scientific community as a whole, put higher demands on the quality of data and metadata.

At DKRZ, this situation led to complementary components of a data management system.
The Earth System Grid Federation ESGF (esgf.llnl.gov) system is focused on the needs of users as partners in globally running projects (project & analysis phase). It includes replication tools, detailed global standards on project level, and efficient search for the data to download. ESGF is a global collaboration of distributed data nodes with large quantities of disk storage. It is designed for efficient data access and dissemination via Graphical User Interfaces, via OPeNDAP server, and via scripts. During the community review data still can be enhanced by new versions or withdrawn. Project specific data quality assurance procedures are supported. Data published in ESGF can in principle be transferred to DKRZ long-term archive later on.

DKRZ’s  digital long-term archive (DKRZ-LTA), in DKRZ’s  digital long-term archive (DKRZ-LTA), in contrast, aims for long term data holding and data reuse requiring high generic metadata quality standards (archiving and bibliometrical phase). Since being approved by the World Data System in 2003, the DKRZ-LTA has been awarded the title of WDCC. Comprehensive quality checks covering metadata and data are in place to ensure these high quality demands. This is additionally supported by best practice information on citation and identifiers, e.g. DOIs (Digital Obj. Identifiers) for the direct integration of research data in scientific publications. Data use and citation can be monitored. The data is stored on multiple tape copies to ensure long-term security.  To ensure efficient data access a 1.5 PB cache is used. The DKRZ-LTA is certified as a long-term archive according to the criteria of the WDS and the DSA.

DKRZ Data Workflow

1)    Data Management Plan

The data time line as well as volumes, structures, access patterns and storage locations have to be defined as accurate as possible for each DKRZ HPC project in order to realize a seamless workflow and efficient use of DKRZ resource.

2)    DKRZ Storage
 
Each DKRZ HPC project has to specify and to apply for compute and storage resources on an annual basis. Storage resources contain disc and tape storage (HPSS). All resources are monitored on the basis of DKRZ HPC projects.

ESGF

ESGF portal see service Data Distribution

 
Climate data integration into ESGF (Earth SystemGrid Federation) requires standardization in order to make data intercomparable within the federation. This data preparation process includes project specifications as well as adaptation of data and metadata to the ESGF data publication interface.
 

4)    CMIP data pool and ESGF services

DKRZ offers a number of services to integrate („publish“), manage, discover, access and analyze climate data. The data allocation includes definition of project specific publication and access policies, adaptation of data check routines and the data publication on the ESGF data node at DKRZ.
 

Long-Term Archive (LTA) at DKRZ – CERA data and information system

LTA CERA portal see service Data Distribution

Long-term archiving is available in two versions:

5)    LTA DOKU

 LTA DOKU stands for in-house long-term archiving in the DOKU(mentation) section of the tape archive at DKRZ. This service offers long-term archiving for a period of 10 years for data of DKRZ HPC-projects as internal reference data only. Only a minimum set of metadata has to be supplied by data providers in order to characterize and identify them in the long-term archive of DKRZ. No additional information on data interpretation is provided. Focus here is on internal data access from data providers. The DKRZ assigns unique labels to the data to allow an unambiguous identification of the data.

6)    LTA WDCC

LTA WDCC stands for long-term archiving in the World Data Center for Climate (WDCC). This service is open for data from DKRZ HPC projects, data from ESGF, and also for data from outside DKRZ. These data are fully integrated in the database system of the WDCC, the aim is to keep the data usable within a period of 10 years . The full set of metadata is provided in order to allow data usage even after ten years or more without contacting the data author. Connected to this archiving service is a fine granular data storage which allows field based data access (CERA container files) in contrast to the file based data access in the LTA DOKU service. The focus here is interdisciplinary data access and re-use. DataCite DOI Data Publication service is available.
 

7)    DataCite DOI Data Publication

DataCite is an international organization which aims to establish easier access to research data, increase the acceptance of research data as legitimate contribution in the scholarly record, and support data archiving to permit results to be re-used. Scientists are enabled to give and to get credit for the preparation of data products by formal data citations. All data from the LTA WDCC service that has passed a final quality assurance procedure are suitable for a DataCite data publication, i.e. citation metadata are published and a DOI (Digital Object Identifier) is minted. After receiving a DOI the data and key metadata remain unchanged, and the data is persistently accessible via its DOI. In order to achieve greater traceability, the metadata of DOI-data is made available in other scientific portals.

Project & Cooperations:

List of current and past projects which use the long term archiving service of DKRZ.

Document Actions