You are here: Home / Services / Data Management / LTA WDCC / Data archival of ESGF-published datasets in LTA WDCC

Data archival of ESGF-published datasets in LTA WDCC

DKRZ offers the long-term archival of ESGF-published data in LTA WDCC.

The option to archive ESGF-published data in LTA WDCC on a project basis is part of the publication agreement signed at the beginning of the ESGF publication process. This workflow depicts an overview of the essential steps needed to be taken during the archival process of ESGF-published data in LTA WDCC. During the process, you will be in touch with DKRZ's data management department (DM, ).

Compared to the LTA WDCC archival and publication process for non-ESGF data, the process of archiving and publishing data already published in ESGF is slightly different - mainly because data published in ESGF are already highly standardised and associated with ample metadata.

Workflow_ESGF_to_WDCC_MS
Schematic depiction of the LTA WDCC archiving workflow of ESGF-published data. The schematic specifically details the elaborate case of CMIP6 data. (Schematic by Martina Stockhause)

 

0) Ingestion of metada 

  • DKRZ staff harvests project and experiment names as well as the summaries of the scientific experiments from the ESGF publication and enters these into the CERA database
  • the process is automated as possible (compared to the strictly manual process for non-ESGF data)

I. Long-Term Archival (LTA)

1) General checks for status and completeness of metadata

  • DKRZ staff checks if metadata are complete, comply with the project requirements and correctly describe the structure of the dataset 

2) Data archival and additional metadata

  • DKRZ staff ingests the data and metadata into LTA WDCC from the data pool and the ESGF-index
  • documentation of the experiments in accordance with ES-DOC is added as ancillary metadata (metadata not available from the ESGF-index) and checked for correctness. Corrections are applied as necessary and feasible.
  • citation information is added as ancillary metadata (authors, title, reference, funders, models and licences, etc.)

3) Check of archived data and metadata

  • especially the metadata are checked for compliance with project standards and agreements

4) Technical quality assurance 

  • DKRZ staff applies an automated quality assurance (QA) procedure to make sure that the data and associated metadata archived in LTA WDCC are consistent
  • LTA WDCC archival of ESGF data is completed upon pass of technical QA and the dataset information is added to the IPCC DDC webpage (if applicable for the project)

II. DataCite DOI process

1) Initialise DOI process

  • DKRZ staff adds/updates metadata as required (e.g. citation) and checks the data quality documentation
  • if applicable for the project, institute and model references are updated on the IPCC-DDC webpage 

2) DOI process

  • DKRZ staff inserts the DOI references and exports the DataCite metadata
  • registration of DataCite metadata files and DOIs
  • checking of DataCite registrations by HTTP GET

3) Update Metadata and finalise DOI publication

  • once the DOIs are successfully registered and retrievable, DKRZ staff updates the metadata to 'DOI published'
  • if applicable, the IPCC-DDC webpage is updated accordingly and the list of DOI references is sent to the corresponding author of the dataset

III. Curation

  • cross-eferences to articles are added via Scholix services
  • errata information is added from errata services

 

 

 

Document Actions