Info
Alle Inhalte des Nutzerportal sind nur auf Englisch verfügbar.
Sie sind hier: Startseite / Services / Code Tuning / Collect Performance Data / Getrusage

Getrusage

Getrusage is a tool developed at DKRZ for printing of detailed resource usage statistics available via getrusage(2).

To use getrusage on Mistral the following module file needs to be loaded:

$ module load getrusage
Thereafter, runtime diagnostics about resource usage (e.g. wall-clock time, max rss (i.e. peak memory usage) etc.) can be collected and printed for a given command, for example:
$ getrusage ls -l

With the option -o the measures can be written into the specified output file, for example:

$ getrusage -o ls_rusage.txt ls -l
$ less ls_rusage.txt

For all details of the getrusage command please refer to the man page or use the --help option:

$ man getrusage
$ getrusage --help

 

In following we provide a short guidance on how to use getrusage to get detailed resource consumption of a parallel program. A script getrusage_aggregate is provided to compute minimum, maximum, sum and mean for each metric.
  1. Change the run script currently invoking the parallel program (named model in this example) like this:
    srun [srun_options...] model [model_args...]
    to
    mkdir -p rusage
    srun [srun_options...] bash -c 'exec getrusage -o rusage/model.${SLURM_PROCID} model [model_args...]'

    If model_args contains strings special to the executing shell, some quoting might be necessary. In case you are currently using a command file to specify executable and arguments for each task (e.g. MPMD setup), you can replace a line like this:

    ./model [model_args...]

    in your command file with this:

    bash -c 'exec getrusage -o "rusage/model.${SLURM_PROCID}" ./model [model_args...]'

    After running the program,  the directory rusage contains files named model.<MPI_rank> with full set of measures captured for each process in the parallel job.

  2. To reduce the metrics to aggregates add the following line to your job-script:
    getrusage_aggregate rusage/model.*
    This will, by default, only print the minimal, maximal, mean and total memory use of the job.
  3. To e.g. add the time used by the processes to the report, try the following:
    getrusage_aggregate --aggregate-key='wall-clock time' rusage/model.*
    Other supported keys are listed on the man pages of getrusage_aggregate:
    $ man getrusage_aggregate
    $ getrusage_aggregate --help
  4. In case one doesn't care about individual processes resource usage, one might remove the logs after generating the report:
    rm -rf rusage

Artikelaktionen