Publishing data in ESGF - a step by step guide

This workflow depicts an overview of the ESGF publication process at DKRZ. During the process, you will constantly be in touch with the ESGF-team of DKRZ's data management department (DM), which is why the first step to take is contact DKRZ with the request for data publication in ESGF ([Email protection active, please enable JavaScript.], see below).

0) Initial contact to DKRZ, clarification of publication conditions

  •  contact with a request to publish a dataset in the ESGF, including information on the project (own project of part of a bigger consortial context?), the expected data volume and storage.
  • Information about the planned sustainable storage of the data is essential, since data published in the ESGF via DKRZ must be stored on the in-house HPC lustre file system. The quota needed for sustainable storage of the data is either granted in the framework of a larger consortium, e.g. CMIP, or must be provided by the data provider, e.g. in the framework of a data project at DKRZ.
  • Data published in ESGF must be in a standardised netCDF-format. If your data is a contribution to a bigger project, the data standard is provided by that project, see e.g. the CMIP5 or CORDEX data standards. If you want to publish data from your own project, it is recommended that the data be standardised along the lines of existing st
    Schematic depiction of the ESGF data publication workflow at DKRZ. Color coding depicts roles of the researcher (blue) and DKRZ (black). Orange depicts end results of the ESGF-publication procedure at DKRZ.
  • The directory structure of data published in ESGF must strictly follow a predefined Data Reference Syntax (DRS) so that the data can be accessed via the ESGF faceted search in the web-interface. Essentially, the DRS is a defined directory structure allowing for a clear identification of an individual file in the myriad of files available in large MIPs.
    • Data structures which do not comply with the agreed DRS are not publishable in ESGF
    • Every project in ESGF may define its own preferred DRS

1) Establish and sign publication agreement 

  • the publication agreement form is available for download here
  • the information regarding the organisation of the data, adherence to data standards, especially the DRS, have all been discussed in step 0)
  • ensuring that the data to be published in ESGF remain publicly accessible at the location specified in the publication agreement is essential

2) Data standardisation 

  • your data need to comply with the structure laid out in the publication agreement - the processing of the data is to be performed by you
  • DKRZ provides services, e.g. software packages like CDO cmor, for the process of the data standardisation. For more information, please also refer to DKRZ ESGF data preparation page.
  • when the processing of the data in accordance with the publication agreement is completed, the data have to be openly accessible on DKRZ's lustre file system for quality assurance checks by DKRZ staff. 

3) QA checks

  • DKRZ staff performs automated quality assurance (QA) checks for the compliance with the data standards laid out in the publication agreement
  • if the data passes the QA checks, publication in ESGF can proceed
  • if the data are found to not fulfil the required standards, you are informed by DKRZ staff about the amendments needed - after these have been applied, the data will be checked again (and sent back with requests for amendments if needed)

4) Data ingestion into the ESGF catalogue 

  • DKRZ staff sets up and performs the ingestion of the data into the ESGF catalogue
  • Please note: data published in ESGF which is stored on the DKRZ lustre file system is not backed up!

5) ESGF publication and data access via the DKRZ CMIP data pool

  • Once fully ingested, the data are findable and accessible via the ESGF web interface
    • Addtionally, the data are accessible to DKRZ users on Mistral via the DRKZ CMIP data pool
    • This also holds for all other datasets published in ESGF by DKRZ, i.e. CMIP5, CORDEX, ReKlies or MiKlip data are easily accessible and do not have to be downloaded. For more information, please see the description of the DKRZ CMIP data pool (DKRZ CDP).

 6) Long-term archival of the data at DKRZ

  • if desired, the data can be preserved using the long-term archiving service LTA WDCC at DKRZ
  • the option to archive data in LTA WDCC is part of the publication agreement signed at the beginning of the ESGF publication process and involves further interaction between DKRZ staff and the data provider. For more information regarding the process, please refer to the LTA WDCC How-To guide for ESGF data and/or contact .
  • data archived in LTA WDCC can still be globally accessed using the ESGF web interface



