Info
Alle Inhalte des Nutzerportal sind nur auf Englisch verfügbar.
Sie sind hier: Startseite / Services / Data Management / LTA WDCC / Archiving Concept

Archiving Concept

Concept and overview of the Long Term Archiving Service at DKRZ

Introduction

The German Climate Computing Centre (DKRZ: Deutsches Klimarechenzentrum GmbH) provides a Long Term Archiving Service for large data sets which are relevant for climate or earth system research. This service includes archiving and retrieval capability of data for time periods 10 years or longer. The data itself will be stored in a dedicated part for this service on the High Performance Storage System (HPSS). There is an additional copy of the data (double) for security reason.

This service is available for all user groups of the DKRZ who wants to archive project data for a long time (e.g. more than 10 years). Non-DKRZ users can also use this service. The costs for this will then be calculated on a case-by-case basis.

contact:

LTA-WDCC

DKRZ LTA has received a certification from the ICSU’s “World Data System” (WDS) in 2003 (renewed in 2011), and acts as well as “DKRZ - WDC Climate” (WDCC).

Fields: Earth Sciences Climate Modelling

The mission of the LTA – WDC Climate is to provide central support for the climate modelling research community. Data for and from climate research are collected, stored and disseminated. The WDCC is restricted to model data products. No raw data storage of data, for example, from satellites or climate models is planned in the WDCC itself.

A further growing area is observational data. They are the outcome of dedicated observational projects or they are the continuous output of observational station like satellites, weather stations or others.

TapeLibrary des DKRZ

1) WDCC Management:

The DKRZ department 'Data management” (formerly: Model & Data) will guide and support the long term archiving process especially for:

  • creating metadata in a format which can be easily inserted into the CERA database
  • data pre-processing with respect to CERA data formats and the CERA2 data model
  • data storing formats (files vs. database container) to fit best data access granularity
  • quality assurance of metadata and data
  • data citation by registration of a DataCite DOI (additional option with an extra fee)

For the description of data the CERA2 Metadata model is used. It allows an extensive description of the data.

Data from the long term archive is available for download without any additional costs. The only restriction is that the data access itself must be controlled by a required user account for data base log in (free of charge) and an access permission (usually 'public' access).

For more details about terms and conditions of depositions see the DKRZ Long Term Archive Depositor Agreement.

For more details about storage and preservation policy and supported formats see Documentation.

HowTo's:
A detailed description how to use the long term archiving service for climate data is available for DKRZ Users and for External Users.

Costs:
The long term archiving service is liable to costs for external projects. The fees arise for:
•    Creating and inclusion of metadata into the CERA data base and in cooperation the data sets into the data base system (personal costs)
•    Costs for storage media (several tape generations)
•    running costs for HSM-System operation, data base and internet access
•    optional: assignment of DataCite DOIs

2) Data & Metadata Preparation

Common for all data is that they must be described by metadata. These metadata are also held in the CERA database. Only if metadata are available data can be searched or downloaded. For more detail see the DKRZ-LTA Data Submission Preparation Guide.

3) Data & Metadata Ingest

Metadata creation for data providers is supported by
•   GUI
•   XML (eXtended Markup Language) templates used for CERA/WDCC metadata ingestion. Please contact data(at)dkrz.de for more information.

4) Data&Metadata Archival Storage

Data publication and DataCite DOI registration
After describing, pre-processing and uploading the data, a last step in the chain of archiving could be to assign a DataCite DOI to the data.

5) DKRZ infrastructure

There are two main storage places for data in the DKRZ structure. During livetime of projects most of the file based project data is stored on disk for better access. File data ist normally stored in the HPSS (High Performance Storage System) where storage media are tapes. Beneath these two types of storage data might exist in the CERA database which is the background store for the World Data Center for Climate (WDC-Climate). For more about the rules please refer to: Rules of the WDC-Climate

All data in the WDCC are described for search and download by metadata which are based on the CERA2 data model. The CERA database is the implemented 'data store' of the WDCC. If the data is stored in files only pointer are stored in the database. This is the case mostly for complete free available data. If the data are inside the database they are stored in containers. The container format is a self designed storage format which allows a more fine granularity of access rights.

6)  Access

Metadata and data are available through a comfortable user interface. Metadata is public. For downloads of data a login to the CERA database is needed.
The DKRZ provides a (Java based) download tool jblob,  which is designed to download data with the help of a WDCC GUI or direct from command line.

Artikelaktionen