Print view

Network Queueing System (NQSII/ERSII)


Job scheduling and control at DKRZ is now done with the latest version of NECs network queuing system (NQSII) and NECs enhanced resource scheduler (ERSII). The load balancing function of ERSII allows for fair share scheduling.

Batch jobs can be submitted directly from cross as well as from hurrikan with qsub <job_script>. You do not need to specify a queue unless you want to use the pp-queue. All jobs on hurrikan (including multi-node) are handled by the default pipe queue and are distributed automatically to the internal queues according to the requested resources (see also resource limits).

The syntax of theNQSII batch system differs from the one known from the former NQS version. All old job scripts have to be adopted to the NQSII syntax and options. In particular the NQS related environment variable have new names. See the table below for some examples. (Note: Not all environment variables of NQS are supported with NQSII.)

NQSII (new) NQS (old)
PBS_O_WORKDIR QSUB_WORKDIR
PBS_JOBID QSUB_REQID
PBS_JOBNAME QSUB_REQNAME

For more information refer to the man pages. A summary of coressponding commands and option of NQS vs. NQSII is available as PDF-document.

EXAMPLES

A: Simple Job (single node squential)

#!/bin/ksh
### PBS -S /bin/ksh        # NQSII Synatx to set the shell (see: "Known limitations" below)
#PBS -l cpunum_prc=1       # 1 CPU (maximum number of CPUs 8)
#PBS -l cputim_job=2:00:00 # 2 h cputime
#PBS -l memsz_job=1gb      # 1 Gbyte memory
#PBS -j o                  # join err and out to out

#PBS -N job_single1        # job name
#PBS -M myname@mymail.de   # you should always specify your email 
                           # address for error messages etc

/bin/echo " job started at: " \\c
date
/bin/echo " ExecutionHost : " \\c
hostname                   # print name of current host 

/ipf/x/xnnnnnn/model

/bin/echo " job completed at: " \\c
date

##########################

B: Simple Job (single node MPI with 4 CPUs)

#!/bin/ksh
### PBS -S /bin/ksh        # NQSII Synatx to set the shell (see: "Known limitations" below)
#PBS -l cpunum_prc=4       # 4 CPUs (maximum number of CPUs 8)
#PBS -l cputim_job=2:00:00 # 2 h cputime
#PBS -l memsz_job=1gb      # 1 Gbyte memory
#PBS -j o                  # join err and out to out

#PBS -N job_single4        # job name
#PBS -M myname@mymail.de   # you should always specify your email 
                           # address for error messages etc

/bin/echo " job started at: " \\c
date
/bin/echo " ExecutionHost : " \\c
hostname                   # print name of current host 

mpiexec -n 4 /ipf/x/xnnnnnn/model_mpi

/bin/echo " job completed at: " \\c
date

##########################

C: Multi Node Job
(in the test phase up to 32 CPUs on 4 Nodes; later more)

The following is an example for an MPI Job which requests 2 compute nodes with 8 CPUs on each node (i.e. 16 CPUs in total !). The scheduler chooses the nodes dynamically, depending on the load. In the script the nodes are named 0 and 1. A four node run would have the numbers 2 and 3 for the additional nodes

#!/bin/ksh
### PBS -S /bin/ksh          # NQSII Synatx  to set the shell (see: "Known limitations" below)
#PBS -l cpunum_prc=8         #  8 cpus per node
#PBS -l cputim_job=10:00:00  # 10 h cputime per node
#PBS -l memsz_job=6gb        #  6 GB Memory per node
#PBS -T mpisx
#PBS -b 2		     # job runs on 2 nodes
#PBS -j o                    # join err and out to out
#PBS -M myname@mymail.de     # you should always specify your email
                             # address for error messages etc
#PBS -N job_multi16          # job name

/bin/echo " job started at: " \\c
date
/bin/echo " ExecutionHost : " \\c
hostname                     # print name of current host 

EXE=/ipf/x/xnnnnnn/model_mpi

mpiexec -host 0 -n 8 -host 1 -n 8 $EXE

/bin/echo " job completed at: " \\c
date

#########################

Known Limitations and work arounds

Deutsches Klimarechenzentrum GmbH | Impressum