You are here: Home / Services / Data Analysis and Visualization / Data Processing / Data Processing on Mistral

Data Processing on Mistral

A part of the Mistral cluster is reserved for data processing and analysis and can be deployed for tasks like

  • time and memory intensive data processing using CDO, NCO, netCDF, afterburner, tar, gzip/bzip, etc.
  • data analysis and simple visualization using MATLAB, R, Python, NCL, GrADS, FERRET, IDL, GMT, etc.
  • archiving and downloading of data to/from HPSS tape archive via pftp
  • connection to external servers via sftp, lftp, scp, globus toolkit
  • data download from CERA/WDCC data base using jblob

      and so on.

Only the advanced visualization applications like Avizo Green, Avizo Earth, Paraview, Vapor etc. need to be run on Mistral nodes dedicated for 3D visualization, as described in the section Visualization on Mistral.

Below, different procedures on how to access hardware resources provided  for data processing and analysis are described. In general, the following three ways are possible:

  • Use interactive nodes mistralpp.dkrz.de
  • Start an interactive session on a node in the SLURM partition prepost
  • Submit a batch job to the SLURM partition prepost or shared

 

Interactive nodes mistralpp

Five nodes are currently available for interactive data processing and analysis. The nodes can directly be accessed via ssh:

$ ssh -X <userid>@mistralpp.dkrz.de

On the interactive nodes, resources (memory and CPU) are shared among all users logged into the node. This might negatively influence the node performance and extend the run time of applications.

Interactive use of nodes managed by SLURM

To avoid use of oversubscribed nodes mistralpp and obtain dedicated resources for your interactive work you can make a resource allocation using the SLURM salloc command and log into the allocated node via ssh. The example below illustrates this approach. The name of the allocated node is set by SLURM in the environment variable SLURM_JOB_NODELIST. 

 

$ salloc -p prepost -A xz0123 -n 1 -t 60 -- /bin/bash -c 'ssh -X $SLURM_JOB_NODELIST'

Please, take care to adapt the settings in the example above (project account (option -A), number of tasks (option -n), wall-clock time (option -t) etc.) to your actual needs.

For hints on how to set the default SLURM account and define a shell alias or function to allocate resources and log into a node in one step, please, refer to our Mistral Tips and Tricks page.

Submitting a batch job

In case your data processing programs do not require an interactive control you can also submit a regular batch job. Below is a batch script example for a job that will use one core on one node in the partition prepost for twenty minutes. Insert your own job name, project account, file names for standard output and error output, resources requirements, and  program to be executed.

#!/bin/bash
#SBATCH -J my_job              # Specify job name
#SBATCH -p prepost             # Use partition prepost
#SBATCH -N 1                   # Specify number of nodes
#SBATCH -n 1                   # Specify max. number of tasks to be invoked
#SBATCH -t 20                  # Set a limit on the total run time
#SBATCH -A xz0123              # Charge resources on this project account
#SBATCH -o my_job.o%j          # File name for standard output
#SBATCH -e my_job.e%j          # File name for standard error output


# Execute a serial program, e.g.
ncl my_script.ncl

 

Document Actions