You are here: Home / Systems / Status of DKRZ systems

Status of DKRZ systems

How to read the DKRZ system status monitor

General information

The status of DKRZ systems (and their associated services) is steadily monitored and analysed. As a quick overview, we present the current status on the welcome page of the User Portal. Users should interprete these boxes as traffic lights:

  • green: system is in normal state and providing the associated service
  • orange: parts of the system are down and service might be interrupted
  • red: system is not available
  • grey: system is in unknown state (most probably due to missing data)

Please also pay attention to the "Last Check" timestamp that should not be older than 5 minutes to trust the data.

We differentiate three main systems at DKRZ

  • HPC - mistral: the HPC cluster (compute nodes and filesystem)
  • CERA: the DKRZ long term archive system for data and metadata
  • HPSS: the tape archive

System specific information

HPC

Operational services of the HPC system are divided into four parts

  • login: ssh access to the cluster
  • SLURM: status of the workload manager
  • lustre01: availability of the lustre filesystem mounted at /mnt/lustre01
  • lustre02: availability of the lustre filesystem mounted at /mnt/lustre02

Depending on the individual status of these four parts, the overall status of the HPC system is determined.

For the HPC system a more detailed status report is available by clicking on the header. You will be forwarded to https://monitoring.dkrz.de where you have to login using your DKRZ account. Here you will find the history of several metrics like number of available HPC compute nodes, load of login nodes, etc.

CERA

The CERA service consists of three parts

  • WDCC Service Status
  • DKRZ cloud
  • ESGF-CERA-Bridge

HPSS

The HPSS service is devided into two parts:

  • tape library: tape retrievals
  • xtape: external read access

Document Actions