You are here: Home / Systems / Mistral / Running Jobs / Example Batch Scripts

Example Batch Scripts

The usage of Hyper-Threading (HT) Technology is enabled on all Mistral nodes i.e. SLURM recognizes 48 logical CPUs per Haswell node (compute partition) and 72 logical CPUs per Broadwell node (compute2 partition). Below examples of batch scripts for the following use cases are provided:

 

Serial job

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=shared     # Specify partition name
#SBATCH --ntasks=1             # Specify max. number of tasks to be invoked
#SBATCH --time=00:30:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output
#SBATCH --error=my_job.e%j     # File name for standard error output


# Execute serial programs, e.g.
cdo <operator> <ifile> <ofile>
The shared partition has a limit of 1300 MB memory per CPU. In case your serial job needs more memory you have to
  • increase the number of tasks (using option --ntasks); although you might not use all these CPUs
  • or specify the amount of memory explicitely (using option --mem); this will automatically increase the number of allocated CPUs
  • alternatively, you can try to run your job in the partition prepost which has maximal 5300 MB memory per CPU.

OpenMP job without Hyper-Threading

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=shared     # Specify partition name
#SBATCH --ntasks=1             # Specify max. number of tasks to be invoked
#SBATCH --cpus-per-task=16     # Specify number of CPUs per task
#SBATCH --time=00:30:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output
#SBATCH --error=my_job.e%j     # File name for standard error output

# Bind your OpenMP threads
export OMP_NUM_THREADS=8
export KMP_AFFINITY=verbose,granularity=core,compact,1
export KMP_STACKSIZE=64m

# Execute OpenMP program, e.g.
cdo -P 8 <operator> <ifile> <ofile>
You need to specify the value of --cpus-per-task as multiple of Hyper-Threads. The environment variable KMP_AFFINITY needs to be set correspondingly.

 

OpenMP job with Hyper-Threading

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=shared     # Specify partition name
#SBATCH --ntasks=1             # Specify max. number of tasks to be invoked
#SBATCH --cpus-per-task=8      # Specify number of CPUs per task
#SBATCH --time=00:30:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output
#SBATCH --error=my_job.e%j     # File name for standard error output

# Bind your OpenMP threads
export OMP_NUM_THREADS=8
export KMP_AFFINITY=verbose,granularity=thread,compact,1
export KMP_STACKSIZE=64m

# Execute OpenMP program, e.g.
cdo -P 8 <operator> <ifile> <ofile>

 

MPI job without Hyper-Threading

The overall setting of the batch script does not vary whether one is using IntelMPI or OpenMPI (or any other MPI implementation). Only specific modules might be used and/or environmental variables should be set in order to fine-tune the used MPI. Especially, the parallel application should always be started using the srun command instead of invoking mpirun, mpiexec or others.

In the following examples 288 cores are used to execute a parallel program. The example on the left is intended to run on mistral phase 1 (i.e. using 12 nodes from the partition compute), the example on the right is prepared for  execution on mistral phase 2  and needs only 8 nodes from the partition compute2. The examples on the top show settings for programs built with OpenMPI (or older bullxMPI), the examples at the bottom show settings for IntelMPI.

Please also read the recommendations for programming on Mistral.

For details how to use bullxMPI with Mellanox tools, please refer to the BULL/Atos guide ”How to use bullxMPI compiled with MXM and FCA tools”.

MPI job with Hyper-Threading

The following examples all ask for 144 MPI-tasks. When using Hyper-Threading, two tasks can use one physical CPU leading to a reduced number of nodes needed for a job - at the expense of a possibly slower runtime. Again, the examples in the left column are intended to run on mistral phase 1 (i.e. Haswell nodes from the partition compute), the examples in the right column are prepared for  execution on mistral phase 2 (i.e. Broadwell nodes from the partition compute2). The examples on the top show settings for programs built with OpenMPI (or older bullxMPI), the examples at the bottom show settings for IntelMPI.

Instead of specifying the choice to use Hyper-Threads or not explicitly via --cpus-per-task and --cpu_bind options, one might also use the srun option --hint=[no]multithread. The following example allocates one full Haswell node and uses 24 tasks without Hyper-Threading for the first program run and then 48 tasks using Hyper-Threading for the second run. Such a procedure might be used in order to see whether an application takes benefits of the use of Hyper-Threads or not.

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=compute    # Specify partition name
#SBATCH --nodes=1              # Specify number of nodes
#SBATCH --time=00:30:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output
#SBATCH --error=my_job.e%j     # File name for standard error output

# Environment settings to execute a parallel program compiled with Intel MPI
module load intelmpi
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so

# First check how myprog performs without Hyper-Threads
srun -l --cpu_bind=verbose --hint=nomultithread --ntasks=24 ./myprog

# Second check how myprog performs with Hyper-Threads
srun -l --cpu_bind=verbose --hint=multithread --ntasks=48 ./myprog

 

Hybrid (MPI/OpenMP) job without Hyper-Threading

The following job example will allocate 4 Haswell compute nodes from the compute partition for 1 hour. The job will launch 24 MPI tasks in total, 6 tasks per node and 4 OpenMP threads per task. On each node 24 cores will be used. These settings have to be adapted to 36 physical CPUs if the compute2 partition is to be used. Furthermore, one has to slightly change the loaded modules and environmental variables set if IntelMPI should be used.

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=compute    # Specify partition name
#SBATCH --nodes=4              # Specify number of nodes
#SBATCH --ntasks-per-node=6    # Specify number of (MPI) tasks on each node
#SBATCH --time=01:00:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output
#SBATCH --error=my_job.e%j     # File name for standard error output

# Bind your OpenMP threads
export OMP_NUM_THREADS=4
export KMP_AFFINITY=verbose,granularity=core,compact,1
export KMP_STACKSIZE=64m

# Environment settings to run a MPI/OpenMP parallel program compiled with OpenMPI and Mellanox libraries
# Load environment
module load intel/version_to_be_used
module load openmpi/2.0.2p1_hpcx-intel14
# Settings for Open MPI and MXM (MellanoX Messaging) library
export OMPI_MCA_pml=cm
export OMPI_MCA_mtl=mxm
export OMPI_MCA_mtl_mxm_np=0
export MXM_RDMA_PORTS=mlx5_0:1
export MXM_LOG_LEVEL=ERROR
# Disable GHC algorithm for collective communication
export OMPI_MCA_coll=^ghc
# limit stacksize ... adjust to your programs need
# and core file size
ulimit -s 102400 ulimit -c 0 # Environment settings to run a MPI/OpenMP parallel program compiled with Intel MPI module load intelmpi export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so # Use srun (not mpirun or mpiexec) command to launch programs compiled with any MPI library srun -l --propagate=STACK,CORE --cpu_bind=cores --cpus-per-task=8 ./myprog

 

Hybrid (MPI/OpenMP) job with Hyper-Threading

The following example will run on 2 Haswell compute nodes from the compute partition having 6 MPI tasks per node and starting 8 threads per node using Hyper-Threading.

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=compute    # Specify partition name
#SBATCH --nodes=2              # Specify number of nodes
#SBATCH --ntasks-per-node=6    # Specify number of (MPI) tasks on each node
#SBATCH --time=01:00:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output
#SBATCH --error=my_job.e%j     # File name for standard error output

# Bind your OpenMP threads
export OMP_NUM_THREADS=8
export KMP_AFFINITY=verbose,granularity=thread,compact,1
export KMP_STACKSIZE=64m

# Environment settings to run a MPI/OpenMP parallel program compiled with OpenMPI and Mellanox libraries
# Load environment
module load intel/version_to_be_used
module load openmpi/2.0.2p1_hpcx-intel14
# Settings for OpenMPI and MXM (MellanoX Messaging) library
export OMPI_MCA_pml=cm
export OMPI_MCA_mtl=mxm
export OMPI_MCA_mtl_mxm_np=0
export MXM_RDMA_PORTS=mlx5_0:1
export MXM_LOG_LEVEL=ERROR
# Disable GHC algorithm for collective communication
export OMPI_MCA_coll=^ghc
# limit stacksize ... adjust to your programs need
# and core file size
ulimit -s 102400 ulimit -c 0 # Environment settings to run a MPI/OpenMP parallel program compiled with Intel MPI module load intelmpi export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so # Use srun (not mpirun or mpiexec) command to launch programs compiled with any MPI library srun -l --propagate=STACK,CORE --cpu_bind=cores --cpus-per-task=8 ./myprog

 

Deep learning job on GPU node

#!/bin/bash
#SBATCH --job-name=my_job      # Specify job name
#SBATCH --partition=gpu        # Specify partition name
#SBATCH --nodes=1              # Specify number of nodes
#SBATCH --constraint=k80       # Constraint for node selection
#SBATCH --mem=0                # Use entire memory of node
#SBATCH --exclusive            # Do not share node
#SBATCH --time=01:00:00        # Set a limit on the total run time
#SBATCH --mail-type=FAIL       # Notify user by email in case of job failure
#SBATCH --account=xz0123       # Charge resources on this project account
#SBATCH --output=my_job.o%j    # File name for standard output
#SBATCH --error=my_job.e%j     # File name for standard error output

module purge
module load anaconda3/bleeding_edge
python tf_cnn_benchmarks.py --num_gpus=4 --batch_size=32 --model=resnet50 --variable_update=parameter_server
You can select nodes with different kinds of GPUs (feature) with the --constraint option. See configuration for available features.

Document Actions