You are here: Home / Systems / Mistral / Running Jobs / Partitions and Limits

Partitions and Limits

In SLURM multiple nodes can be grouped into partitions which are sets of nodes with associated limits for wall-clock time, job size, etc. These limits are hard limits for the jobs and can not be overruled. The defined partitions can overlap, i.e. one node might be contained in several partitions.

Jobs are allocations of resources by users in order to execute tasks on the cluster for a specified period of time. Furthermore, the concept of job steps is used by SLURM to describe a set of different tasks within the job. One can imagine job steps as smaller allocations or jobs within the job, which can be executed sequentially or in parallel during the main job allocation.

The SLURM sinfo command lists all partitions and nodes managed by SLURM on Mistral as well as provides general information about the current nodes' status:

$ sinfo

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
compute      up    8:00:00     13  down* m[10278,10286,10438,10498,10518,1055,...,11406]
compute      up    8:00:00     14    mix m[10000-10001,10011-10016,10036-10041,10048]
compute      up    8:00:00     81  alloc m[10042,10047,10049,10072-10103,...,11332-11345]
compute      up    8:00:00   1388   idle m[10002-10010,10017-10035,...,11440-11511]
prepost      up    4:00:00      3 drain* m[11518,11532,11554]
prepost      up    4:00:00     45   idle m[11512-11517,11519-11531,11533-11553,11555-11559]
shared       up 7-00:00:00     14    mix m[10001,10011-10016,10036-10041,10048]
shared       up 7-00:00:00     17  alloc m[10042,10047,10049,11332-11345]
shared       up 7-00:00:00     68   idle m[10002-10010,10017-10035,10043-10046,11296-11331]
gpu          up   12:00:00      1    mix mg102
gpu          up   12:00:00      9  alloc mg[100-101,103-108,200]
gpu          up   12:00:00     11   idle mg[109-111,201-208]
miklip up 2-00:00:00 30 idle m[21386-21388,21402-21405,21429-21431,21579-21585,21588-21590,21593-21602]
compute2 up 8:00:00 1 drain m20842
compute2 up 8:00:00 1000 alloc m[20000-20827,20846-21007,21386-21395]
compute2 up 8:00:00 433 idle m[20828-20841,20843-20845,21008-21385,21396-21433]

For detailed information about all available partitions and their limits use the SLURM scontrol command as follows:

$ scontrol show partition

The following four partitions are currently defined on Mistral:

compute

This partition consists of 1516 phase 1 compute nodes (equipped with Haswell CPUs) and is intended for running parallel scientific applications. The compute nodes allocated for a job are used exclusively and cannot be shared with other jobs.

compute2

This partition consists of 1762 phase 2 compute nodes (equipped with Broadwell CPUs) and is intended for running parallel scientific applications. The compute nodes allocated for a job are used exclusively and cannot be shared with other jobs.

shared

This partition is defined on 36 nodes and can be used to run small jobs not requiring a whole node for the execution, so that one compute node can be shared between different jobs. The partition is dedicated for execution of shared memory applications parallelized with OpenMP or pthreads as well as for serial and parallel data processing jobs.

prepost

The prepost partition is made up of 43 large-memory nodes. It is dedicated for memory intensive data processing jobs. Nodes of this partition can be shared with other jobs if a single job does not allocate all resources. It is possible to access external sites from nodes in this partition, for example to download or upload data.

gpu

The 21 nodes in this partition are additionally equipped with different Nvidia GPUs and can be used for interactive 3-dimensional data visualization via VirtualGL/TurboVNC or execution of applications ported to GPUs. Please refer to the detailed hardware list to identify which GPUs are available.

The SLURM limits configured for different partitions are:

Partition Name
Max Nodes per Job
Max Job Runtime Max Nodes used simultaneously by one user Shared Node Usage

Default Memory per CPU

Max Memory per CPU

compute 512 8 hours no limit no 1280 MB 5300 MB
compute2 512 8 hours no limit no 880 MB 3500 MB
shared 1 7 days no limit yes 1280 MB 1300 MB
prepost 2 12 hours no limit yes 1280 MB 5300 MB
gpu 2 12 hours 2 yes 1280 MB 14000 MB
If your jobs require either longer execution times or more nodes, contact DKRZ Help Desk. The predefined limits can be adjusted for a limited time to match your purposes by specifying an appropriate Quality of Service (QOS). Please, include the following information in your request: the reason why you need higher limits, what limits to increase, and for how long those should be increased. Also a brief justification by your project admin is needed.

 

Beginning with September 1st 2016, all jobs on mistral have to be assigned to a partition - there is no longer a default partition available. Choosing the partition can be done in various ways

    • environment variable
export SBATCH_PARTITION=<partitionname>
    • batch script option
#SBATCH [-p|--partition=]<partitionname>
    • command line option
sbatch [-p|--partition=]<partitionname>

Note that an environment variable will override any matching option set in a batch script, and command line option will override any matching environment variable.

To control the job workload on mistral cluster and keep the SLURM responsive, we enforce the following restrictions regarding the number of jobs:

SLURM Limits Max Number of Submitted Jobs Max Number of Running Jobs
Cluster wide 10000 10000
Per User and Account 1000 20

If needed, you can ask for higher limits by sending a request with a short justification to . Based on the technical limitations and a fair share between all users, we might then arrange a QOS for some limited time.

To list job limits and quality of services relevant to you, use the sacctmgr command, for example:

sacctmgr -s show user $USER

sacctmgr -s show user $USER format=user,account,maxjobs,maxsubmit,qos

 

Document Actions