Job scheduling and control at DKRZ is now done with the latest version of NECs network queuing system (NQSII) and NECs enhanced resource scheduler (ERSII). The load balancing function of ERSII allows for fair share scheduling.
Batch jobs can be submitted directly from cross as well as from hurrikan with qsub <job_script>. You do not need to specify a queue unless you want to use the pp-queue. All jobs on hurrikan (including multi-node) are handled by the default pipe queue and are distributed automatically to the internal queues according to the requested resources (see also resource limits).
The syntax of theNQSII batch system differs from the one known from the former NQS version. All old job scripts have to be adopted to the NQSII syntax and options. In particular the NQS related environment variable have new names. See the table below for some examples. (Note: Not all environment variables of NQS are supported with NQSII.)
| NQSII (new) | NQS (old) |
|---|---|
| PBS_O_WORKDIR | QSUB_WORKDIR |
| PBS_JOBID | QSUB_REQID |
| PBS_JOBNAME | QSUB_REQNAME |
For more information refer to the man pages. A summary of coressponding commands and option of NQS vs. NQSII is available as PDF-document.
#!/bin/ksh
### PBS -S /bin/ksh # NQSII Synatx to set the shell (see: "Known limitations" below)
#PBS -l cpunum_prc=1 # 1 CPU (maximum number of CPUs 8)
#PBS -l cputim_job=2:00:00 # 2 h cputime
#PBS -l memsz_job=1gb # 1 Gbyte memory
#PBS -j o # join err and out to out
#PBS -N job_single1 # job name
#PBS -M myname@mymail.de # you should always specify your email
# address for error messages etc
/bin/echo " job started at: " \\c
date
/bin/echo " ExecutionHost : " \\c
hostname # print name of current host
/ipf/x/xnnnnnn/model
/bin/echo " job completed at: " \\c
date
##########################
#!/bin/ksh
### PBS -S /bin/ksh # NQSII Synatx to set the shell (see: "Known limitations" below)
#PBS -l cpunum_prc=4 # 4 CPUs (maximum number of CPUs 8)
#PBS -l cputim_job=2:00:00 # 2 h cputime
#PBS -l memsz_job=1gb # 1 Gbyte memory
#PBS -j o # join err and out to out
#PBS -N job_single4 # job name
#PBS -M myname@mymail.de # you should always specify your email
# address for error messages etc
/bin/echo " job started at: " \\c
date
/bin/echo " ExecutionHost : " \\c
hostname # print name of current host
mpiexec -n 4 /ipf/x/xnnnnnn/model_mpi
/bin/echo " job completed at: " \\c
date
##########################
The following is an example for an MPI Job which requests 2 compute nodes with 8 CPUs on each node (i.e. 16 CPUs in total !). The scheduler chooses the nodes dynamically, depending on the load. In the script the nodes are named 0 and 1. A four node run would have the numbers 2 and 3 for the additional nodes
#!/bin/ksh
### PBS -S /bin/ksh # NQSII Synatx to set the shell (see: "Known limitations" below)
#PBS -l cpunum_prc=8 # 8 cpus per node
#PBS -l cputim_job=10:00:00 # 10 h cputime per node
#PBS -l memsz_job=6gb # 6 GB Memory per node
#PBS -T mpisx
#PBS -b 2 # job runs on 2 nodes
#PBS -j o # join err and out to out
#PBS -M myname@mymail.de # you should always specify your email
# address for error messages etc
#PBS -N job_multi16 # job name
/bin/echo " job started at: " \\c
date
/bin/echo " ExecutionHost : " \\c
hostname # print name of current host
EXE=/ipf/x/xnnnnnn/model_mpi
mpiexec -host 0 -n 8 -host 1 -n 8 $EXE
/bin/echo " job completed at: " \\c
date
#########################
#! /bin/ksh") #PBS -V exports all
environment variables, including the NQSII specific variables
PBS_<xxxx>. This may lead to problems in chain jobs.
Please contact Beratung if you need advice. "a.out > mydir/mylog"