You are here: Home / Help / FAQ


FAQ and known issues


How can I login to mistral, change my password and login shell?

Login to the system via:

ssh <user>

Change your password and/or login shell via  DKRZ  online

How can I check my disk space usage?

Your individual disk space usage in HOME and SCRATCH areas as well as project quota in the WORK data space can be checked in DKRZ online portal. The numbers are updated daily.

How can I access my Lustre data from outside DKRZ/ZMAW?

For data transfer you can use either sftp:

$ sftp <user>

or  rsync command:

$ rsync -atv <user> /local/path
How can I choose which account to use, if I am subscribed to more than one project?

Just insert the following line into your job script:

#SBATCH --account=<Project>            (e.g.  #SBATCH --account=xz0123)

There is no default project account on mistral.

When will my SLURM job start?

The SLURM squeue command with the options - -start and -j provides an estimate for the job start time:

$ squeue --start -j <jobid>

439148   compute    LSea1  u123456 PD 2015-10-15T16:36:49     80 m[10020-10027,10029, (Resources)
Why does my job wait so long before being executed? or: Why is my job being overtaken by other jobs in the queue?

There are several possible reasons for to be queued for a long time and/or to be overtaken ...

  • ... later submitted jobs with a higher priority (usually these have used less of their share then your job).
  • ... by jobs with lower priority that are sufficiently small and specified a wall clock limit to allows them to be considered for backfilling
How can I run a short MPI job using up to 4 nodes?

You can use SLURM Quality of Service (QOS) express by inserting the following line into your job script:

#SBATCH --qos=express

or using the option --qos with the sbatch command:

$ sbatch --qos=express <my_job_script>

The QOS has the following properties:

$ sacctmgr show qos express format=Name,Priority,MaxTres,MaxWall,Flags

      Name   Priority       MaxTRES     MaxWall                Flags
---------- ---------- ------------- ----------- --------------------
   express        100        node=4    00:20:00          DenyOnLimit

It is meant for short tests, debugging and similar uses and should not be used for repeated production runs.

How can I see on which nodes my job was running?

Yon can use the SLURM sacct command with the following options:

$ sacct --format=user,jobid%10,nodelist%50 -X -j <jobid>
How can I get a stack trace if my program crashes?

The classical approach to find the location where your program crashed is to run it in a debugger or inspect a core file with the debugger. A quick way to get the stack trace without the need for a debugger is to compile your program with the following options:

$ ifort -g -O0 -traceback -o my_program my_program.f90

In case of segment violation during execution of the program, detailed information on the location of the problem (call stack trace with routine names and line numbers) will be provided:

$ ./my_program
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
my_program         0000000000403360  Unknown               Unknown  Unknown    00007F8465324710  Unknown               Unknown  Unknown
my_program         0000000000402F5E  mo_calc_sum_mp_ca          10  my_program.f90
my_program         00000000004031A1  MAIN__                     26  my_program.f90
my_program         0000000000402E5E  Unknown               Unknown  Unknown          00007F8464D9BD5D  Unknown               Unknown  Unknown
my_program         0000000000402D69  Unknown               Unknown  Unknown

Real debuggers like Allinea DDT installed on mistral will allow you to get much more information in case the problem is not easily identified.

How can I avoid core files if my program crashes

Core files can be very helpful when debugging a problem but they also take a long time to get written for large parallel programs. This will limit the core size to zero, i.e. no core files are written:

ulimit -c 0
srun --propagate=STACK,CORE ...

Note that due to a bug in our current installation of the slurm scheduler, the option

#SBATCH --propagate=STACK,CORE

has no effect on srun.

Is a FTP client available on mistral?

LFTP is installed on mistral for download and upload of files from/to an external server via File Transfer Protocol (FTP):

$ lftp <ftp_server_name>

The user name for authentication can be provided via option '-u' or '--user', for example:

$ lftp -u guest

Note: a comprehensive list of software installed on mistral can be found here.

How to improve interactive performance of MATLAB

When using ssh X11-Forwarding (options -X or -Y), matlab can be slow to start and also have slow response to interactive use. This is because X11 sends many small packets over the network, often awaiting a response before continuing. This interacts unfavorably with medium or even higher latency connections, i.e. WiFi. When starting matlab on mistralpp nodes, another disturbing factor is overloading of these nodes.

But mistral has means to eliminate both of these issues: GPU nodes provide exclusive resources and allow for starting a remote desktop session that does not suffer from network latencies. Furthermore, the program execution can benefit from the hardware acceleration for 3D-plots or other graphics-intensive matlab sessions.

The steps to set this up are as follows:

  1. Reserve resources on a GPU node (if you know the number of CPUs or the amount of memory needed for your Matlab session provide the -n or --mem options or use the -N 1 option to reserve a node exclusively) and connect to it via ssh from one of the login nodes:

    mlogin100% salloc -n 1 --mem 32000 -p gpu -t 12:00:00 -A YOUR_PROJECT -- /bin/bash -c 'ssh -X $SLURM_JOB_NODELIST'
  2. Start a VNC server on the GPU node:

    mgXXX% /opt/TurboVNC/bin/vncserver -geometry 1680x1050 -localhost

    The argument of geometry should take a value that works well for your screen, see your desktop preferences for an appropriate value.

  3. The output will be something like

    Desktop 'TurboVNC: mg100:1 (k202069)' started on display mg100:1

    Notice the server, in this case mg100, and the display (:1).

  4. Open a second ssh connection from your local computer. Here we forward the port for vnc communication which is 5900 + display number. For the example above, it would be 5901.

    your-machine:~> ssh -L YYYY:localhost:YYYY

    Enter your specific host and port number for XXX and YYYY

  5. Start vncviewer on your local machine:

    your-machine:~> vncviewer localhost:DISPLAY

    DISPLAY must match what your session in 1. provided.

    If your local machine has Linux or Mac operating system you can use a launcher provided by DKRZ to easily establish a VNC connection to a GPU node. The launcher automatically connects to Mistral, reserves a GPU node, starts a vnc server and connects to this server using your local vncviewer. You need to complete the downloaded script start-vnc by adding your personal project and user numbers or to use the corresponding command line options:
    your-machine:~> chmod u+x start-vnc
    your-machine:~> ./start-vnc
    your-machine:~> ./start-vnc -u <userid> -A <prj_account>

    To list all available command line options execute:

    your-machine:~> ./start-vnc -h

    If you have not set a VNC server password yet, the vncpasswd command will be called in the script and you will be prompted to create a password:

    No VNC password found. Please set now.
    Would you like to enter a view-only password (y/n)? n

    The created password will be stored in $HOME/.vnc/passwd file. Since it only protects the VNC session, it should be different from your regular access password to DKRZ services. Also, since the password will only be used in automated logins afterwards and you don't need to enter it on the console, it can be long and random at no cost.
    Furthermore, public key based login to MISTRAL is recommended to avoid multiple entering of your regular password.

  6. In the VNC desktop that should appear on your local workstation, open a terminal window via the Applications menu:

    Applications -> System Tools -> Terminal

  7. Load the matlab module in the terminal window and start matlab with vglrun and without software OpenGL:

    $ source /etc/profile.d/
    $ module load matlab $ vglrun matlab
    Note: It is necessary to start MATLAB in your HOME directory to avoid excessive warnings or error messages.

  8. To terminate your VNC session choose the Disconnect icon [x] in the top menu of the VNC desktop.

Can I run cron jobs on Mistral?

For system administration reasons users are not allowed to shedule and execute periodic jobs on Mistral using the cron utility. Our recommendation is to use the functionality provided by the workload manager SLURM for this purpose. With the option --begin of the sbatch command you can postpone the execution of your jobs until the specified time. For example, to run a job every day after 12 pm you can use the following job script re-submitting itself at the beginning of the execution:

#SBATCH --begin 12:00
#SBATCH --account=<prj_account> #SBATCH --partition=shared #SBATCH --time=01:00:00 #SBATCH --output=my_script.o%j set -e # Re-submit job script for the next execution sbatch # Do planned work

A variety of different date and time specifications is possible with the --begin option, for example: now+1hour, midnight, noon, teatime, YYYY-MM-DD[Thh:mm:ss], 7AM, 6PM etc. For more details see manual pages of the sbatch command:

man sbatch

With SLURM the scheduled tasks are kept highly available while cron is not tolerant of single system failures (i.e. if a login node chosen for execution of cron jobs were to fail, the jobs would not be executed). The other advantage of using SLURM is that job output is logged to a unique file (see option --output) by default.

How to Set the default SLURM project account

On Mistral, specification of the project account (via option -A or --account) is necessary to submit a job or make a job allocation, otherwise your request will be rejected. To set the default project account you can use the following SLURM input environment variables

  • SLURM_ACCOUNT   - interpreted by srun command
  • SALLOC_ACCOUNT - interpreted by salloc command
  • SBATCH_ACCOUNT - interpreted by sbatch command

Once the variables are defined, the option -A or --account can be dropped (in this case the compute time consumption is charged to the default account) or used to override environment variables settings.

If you use bash as your login shell, you can place the following settings in your ~/.bashrc file and source this file in the ~/.bash_profile or in the ~/.profile file:

export SLURM_ACCOUNT=xz0123

If you use tcsh as your login shell, you can put the following settings in your ~/.cshrc file:

setenv SLURM_ACCOUNT xz0123

NOTE: The environment variable SBATCH_ACCOUNT takes precedence over account settings made in a batch script via

#SBATCH --account=yz0456
How to View detailed job information when the job is already running

Once your batch job started execution (i.e. is in RUNNING state) your job script is copied to the slurm admin nodes and kept until the jobs finalizes - this prevents problems that might occur if the job script gets modified while the job is running. As a side-effect you can delete the job script without interfering the execution of the job.

If you accidentally removed or modified the job script of a running job, you can use the following command to query for the script that is actually used for executing the job:

scontrol -dd show job=<jobid> | less


How to use modules in batch scripts

The module environment is only available if the according module command was defined for the current shell. If you are using different shell as login shell and for job batch scripts (e.g. tcsh as login shell and your job scripts start with #!/bin/bash), you need to source one of the following files in your script before any invocation of the module command:

# in bash or ksh script
source /etc/profile.d/

# in tcsh or csh script
source /etc/profile.d/mistral.csh
How to Write a shell alias or function for quick login to a node managed by SLURM

For tasks better run in a dedicated but interactive fashion, it might be advantageous to save the repeating pattern of reserving resources and starting a new associated shell in an alias or function, as explained below.

If you use bash as default shell you can place the following alias definition in your ~/.bashrc file and source this file in the ~/.bash_profile or in the ~/.profile file:

alias ssh2node='salloc -n 1 -p prepost -- /bin/bash -c "ssh -tt -X \$SLURM_JOB_NODELIST"'

If you use tcsh as default shell you can put the following line in your ~/.cshrc file to define the alias ssh2node:

alias ssh2node 'salloc -n 1 -p prepost -- /bin/tcsh -c '\''ssh -X $SLURM_JOB_NODELIST'\'''

Thereafter, the command ssh2node can be used to allocate computing resources and log into the allotted node in one step:

$ ssh2node
salloc: Pending job allocation 292893
salloc: job 292893 queued and waiting for resources
salloc: job 292893 has been allocated resources
salloc: Granted job allocation 292893
Warning: Permanently added 'm11515,' (RSA) to the list of known hosts.

An even more flexible solution is to define a shell function that combines 'salloc' and 'ssh' and accepts options to specify those requirements likely to change (e.g. number of tasks, partition, project account).

The example below defines a bash function s2n that can be added to your ~/.bashrc file:

    # set default values for number of tasks, partition, and account
    # parse options with arguments
    while getopts :n:p:a: opt; do
        case ${opt} in
            n ) ntasks=$OPTARG
            p ) partition=$OPTARG
            a ) account=$OPTARG
           \? ) echo "Invalid option: $OPTARG" 1>&2
            : ) echo "Invalid option: $OPTARG requires an argument" 1>&2
    # allocate ressources and log into the allotted node
    salloc -A ${account} -n ${ntasks} -p ${partition} -- /bin/bash -c "ssh -tt -X \$SLURM_JOB_NODELIST"

and used to start interactive sessions according to different needs, for example:

# ask for 4 CPUs on one node in the partition prepost using default account
$ s2n -n 4 -p prepost

# use account xz0456 instead of default account
$ s2n -a xz0456

# ask for 48 CPUs on one node in the partition gpu
$ s2n -n 48 -p gpu

Tcsh or csh do not allow for shell functions. This deficiency can be partly compensated for by defining an alias with positional parameters as shown in the example below:

alias s2n 'salloc -n \!:1 -p \!:2 -A \!:3 -- /bin/tcsh -c '\''ssh -X $SLURM_JOB_NODELIST'\'''

However, for tcsh/csh, all parameters MUST be specified and the order of parameters (in the example above number of tasks, partition, account) DOES matter:

# ask for 4 CPUs on on node in the partition prepost using default account
$ s2n 4 prepost xz0123

# use account xz0456 instead of default account
$ s2n 1 prepost xz0456

# ask for 48 CPUs on one node in the partition gpu
$ s2n 48 gpu xz0123

Another possibility (not shown here) is to write a Unix utility using a language of your choice (python, perl etc.), place it in your ~/bin directory and add this directory to your PATH.

How to use SSHFS to mount remote lustre filesystem over SSH

In order to interact with directories and files located on the lustre filesystem, users can mount the remote filesystem via SSHFS (SSH Filesystem) over a normal ssh connection.

SSHFS is Linux based software that needs to be installed on your local computer. On Ubuntu and Debian based systems it can be installed through apt-get. On Mac OSX you can install SHFS - you will need to download FUSE and SSHFS from the osxfuse site. On Windows you will need to grab the latest win-sshfs package from the google code repository or use an alternative approach like WinSCP.

Mounting the Remote File System: The following instructions will work on Ubuntu/Debian

Simply run the SSHFS command to mount the remote lustre directory. In this example, the remote directory is /pf/m/m123456 on for the user m123456. The local mount point is assumed to be ~/mistral_home

$ mkdir ~/mistral_home
$ sshfs -o idmap=user [Email protection active, please enable JavaScript.]:/pf/m/m123456 ~/mistral_home

To unmount a remote directory use

$ fusermount -u ~/mistral_home
Python Matplotlib fails with "QXcbConnection: Could not connect to display"

Matplotlib is useful for interactive 2D plotting and also for batch production of plots inside a job. The default behavior is to do interactive plotting which requires the package to open a window on your display. For this purpose you have to log into mistral with X11 forwarding enabled.

ssh -X [Email protection active, please enable JavaScript.]

 If you run matplotlib in a jobscript where you just want to create files of your plots, you have to tell matplotlib to use a non-interactive backend. See matplotlib's documentation how to do that and which backends are available. Here is how to select the Agg backend (raster graphics png) inside your script. Add to the top of your imports

import matplotlib
How do I log into the same login or pp node I used before or mistralpp maps to a whole group of nodes to distribute the load. They all share the same file system so most of the time you do not have to care which node you are on. However, there are reasons why you may want to connect to a specific node. You first have to find out on which node you are. This may be indicated in your prompt or you can also use hostname for this purpose.

b999009@mlogin103% hostname

 In this case you are on login node 3. Connect to this node with


For mistralpp, the scheme is the same.

Please keep in mind that login nodes are not intended for long running and computationally intensive tasks. Use batch jobs for this kind of activity.

I want to add my own packages to Python or R but they won't compile

Python and R, among other scripting languages, allow users to create customized environments including their own set of packages.

For Python you use virtualenv or conda, R can also add locally installed packages.

Some of these packages require a C compiler to be built. This is usually the compiler which was used by DKRZ to build the underlying Python or R. Therefore, you have to load the module for that compiler.

Say, you want to build a package for r/3.5.3, then you also have to load gcc/4.8.2. You can verify this by looking into the module for r/3.5.3.

% module load r/3.5.3 gcc/4.8.2
% module show r/3.5.3 ------------------------------------------------------------------- /sw/rhel6-x64/Modules/r/3.5.3:

module-whatis     r 3.5.3 conflict     r
prepend-path     PATH /sw/rhel6-x64/r/r-3.5.3-gcc48/bin prepend-path     MANPATH /sw/rhel6-x64/r/r-3.5.3-gcc48/share/man -------------------------------------------------------------------

Notice the gcc48 in prepend-path. It is important to stay with one compiler and not mix packages compiled with different compilers.

How do I share files with members of another project?

You can use ACLs to achieve this. As a member of project group ax0001, you would have to create a directory in your project's work for example

mkdir /work/ax0001/shared

It could be any other place on Lustre file systems where you have write access. Then you grant project bx0002 permissions to this directory

setfacl -m "g:bx0002:rwx" /work/ax0001/shared

This would allow all members of the group bx0002 to read and write the directory /work/ax0001/shared.

You can check the permissions with getfacl

% getfacl -p /work/ax0001/shared
# file: /work/ax0001/shared
# owner: b380001
# group: ax0001
How to prevent interuptions of ssh connections to Mistral?

If your ssh connections to mistral are interrupted after short periods without keyboard activities and you get an error message containing 'broken pipe' string, try to set the ServerAliveInterval parameter appropriately. This parameter can be set as a command-line option to ssh:

ssh -o ServerAliveInterval=60 -X [Email protection active, please enable JavaScript.]

In the example above, ssh will send a message with a response request to the server if no packets have been received from the server in the past 60 seconds.

The more convenient way to always use the above setting is to add the following lines to the user's configuration file for the ssh client, ~/.ssh/config on your local machine:

Host *
      ServerAliveInterval 60

If the file ~/.ssh/config does not exist, you can simply create it with an editor of your choice. For further information on configuration files and parameters for ssh client, please refer to the manual page:

man ssh_config


Which MPI library and compiler should I use?

For model simulations in production mode, the recommended combination is to

  1. Choose an Intel compiler version that has been validated to work with your model. Lacking a verified version, just use the most recent version (module intel/18.0.4 at the time of this writing) and validate that yourself.
  2. Use Open MPI 2.0.2p2 with HPCX toolkit (module openmpi/2.0.2p2_hpcx-intel14). The '-intel14' suffix indicates the the Open MPI library has been built for Intel 14, but it is compatible with all newer Intel compiler versions. For some models or single node jobs, it might make sense to use Intel MPI (module intelmpi/2018.5.288) because it is sometimes faster. Please notify us, in case that proves true for you.
  3. Do not forget to consult the recommended environment settings and adjust your run script accordingly. Without these settings applications can run unexpectedly slowly.


Links to further FAQ of interest



Document Actions