You are here: Home / Services / Data Management / Data Provisioning

Data Provisioning

DKRZ provides centralised access to large-volume flagship data sets and the IT resources required to process them.

Accessing centralised data and IT resources at DKRZ - an overview workflow

DKRZ is dedicated towards providing researchers in the Earth System Sciences with a working environment which most effectively facilitates cutting-edge research. This research is most often only possible with reliable and easy access to large-volume flagship datasets - such as the reanalysis products of the ECMWF - and the availability of ample compute resources to process large data amounts.

Here, we provide a brief overview of DKRZ's data provisioning service - which comes along with access to DKRZ HPC infrastructure. A summary of the motivation and the concrete benefits of this service are presented on DKRZ's main pages.

DataProvision_191014
Schematic depiction of DKRZ's data provisioning service. (Icons of researchers are by Gan Khoon Lay, Noun Project)

 1) Data request 

  • large-volume data sets required by researchers in the Earth System Sciences are normally not available in  local, relatively small-sized IT environments
  • DKRZ hosts a centralised collection of selected large-volume datasets often needed in the Earth System Sciences - researchers are encouraged to contact DKRZ's Data Management  department (DM) via  with their data needs

1.1) Requested data are available at DKRZ

  • in case the needed data are available at DKRZ (see also 5) below), DM advises regarding access to the data. The data are available through a suite of services (ESGF, WDCC, Data Pool)
  • the data can be efficiently processed using the compute resources at DKRZ

2) DM contacts external data producer if data are not yet available at DKRZ

  • in case of high and/or foreseeable demand for specific data sets which are not yet hosted at DKRZ, DM seeks contact to the corresponding external data provider to negotiate data acquisition and local provision
  • for some cases, external data providers impose use constraints on their data - DM negotiates the conditions for providing the data in a centralised location and assures compliance with data usage restrictions

3) DM staff acquires data from external data producer

  • once the negotiations regarding data acquisition with the data producer are successful, experienced DKRZ DM staff begins to download the corresponding data sets
  • total volumes can be in the range of several PBs per data set, which is why the built-up expertise at DKRZ is crucial for successful data download, e.g. the possibility of privileged access to ECMWF servers for bulk downloads of ERA 5 data without long queuing time
  • externally provided data are obtained in the original data format. If an increase in the usability of the data is required, e.g. file format conversions, restructuring of the data or storage of the data using an intuitive directory structure, DM staff also provides this service given manageable effort (transition from "white" to "grey" data in the above schematic)

4) DM staff transfers data to user-accessible location

  • once the data processing is complete and the data are ready for use by the community, DM staff makes the data available within the DKRZ infrastructure

5) Data is available to the research community

 

Take home message

If you need specific large-volume data sets for your Earth System Science research project, contact DKRZ Data Management () and we will provide you with effective and powerful data services to support your research!

Document Actions