Alle Inhalte des Nutzerportal sind nur auf Englisch verfügbar.
Sie sind hier: Startseite / Services / Code Tuning / Collect Performance Data / Score-P, Vampir and Extrae

Score-P, Vampir and Extrae

Score-P (Scalable Performance Measurement Infrastructure for Parallel Codes) is a software system that provides a measurement infrastructure for profiling, event trace recording, and online analysis of High Performance Computing (HPC) applications. Vampir focuses on the graphical presentation of performance data and adds support for the collaborative Score-P performance monitor. Compatibility with earlier OTF and VampirTrace releases is also maintained. Extrae is a dynamic instrumentation package to trace programs compiled and run with the shared memory models, the message passing (MPI) programming model or both programming models. Extrae generates trace files that can be latter visualized with Paraver.

General hint

The Performance Optimisation and Productivity Centre of Excellence in Computing Applications (POP) provides users with excellent training material and exercises. Besides general training material regarding parallel programming, you can find information on the performance tools, explanatory material regarding analysis methods and optimization techniques.

Please refer to



Here we just give a short overview on how to use Score-P to analyse your code. For any details please visit to get more information.

Getting started

The three core steps of a typical work cycle in the investigation of the behaviour of a software package can be described as follows:

  1. Instrumentation of user code: Calls to the measurement system are inserted into the application. This can be done either fully automatically or with a certain amount of control handed to the software developer.
    • load the desired module in order to set your environment (if Score-P is not present for your choice of MPI, please contact DKRZ), e.g.
      module load scorep/
    • change your makefile (or any other steps within compilation workflow) to use the scorep command that needs to be prefixed to all the compile and link commands usually employed to build the application, e.g.
      scorep mpiifort app1.f90 app2.f90 -o app
    In some cases the scorep wrapper does not automatically recognizes the instrumenter you need. Then you might specify the desired instrumentation with scorep options e.g.
    scorep --mpp=[mpi|none] --thread=[omp|none|pthread] --[no]compiler mpiifort -c app1.f90
  2. Measurement and analysis: The instrumented application is executed under the control of the measurement system and the information gathered during the run time of this process is stored and analysed.
    • Once the code has been instrumented, you can initiate a measurement run using this executable. To this end, it is sufficient to simply execute the target application in the usual way - i.e. no modification of your batch-scripts should be needed
    • When running the instrumented executable, the measurement system will create a unique directory called scorep-YYYYMMDD_HHMM_XXXXXXXX where its measurement data will be stored. Thus, repeated measurements can easily be performed without the danger of accidentally overwriting results of earlier measurements. The environment variables SCOREP_ENABLE_TRACING and SCOREP_ENABLE_PROFILING control whether event trace data or profiles are stored in this directory. By setting either variable to true, the corresponding data will be written to the directory. The default values are true for SCOREP_ENABLE_PROFILING and false for SCOREP_ENABLE_TRACING.
  3. Examination of results: The information about the behaviour of the code at run time is visualized and the user gets the opportunity to examine the reported results.
    • After the completion of your application, the requested data (traces or profiles) is available in the indicated locations.
    • Appropriate tools can then be used to visualize this information and to generate reports, and thus to identify weaknesses of the code that need to be modified in order to obtain programs with a better performance. A number of tools are already available for this purpose. This includes, in particular, the CUBE4 performance report explorer for viewing and analyzing profile data,
      cube ./scorep-20160715_1905_1368543919529401856/profile.cubex
      and Vampir for the investigation of trace information
      vampir ./scorep-20160715_1905_1368543919529401856/traces.otf2

Usage of scorep-score

scorep-score is a tool that allows to estimate the size of an OTF2 trace from a CUBE4 profile. Furthermore, the effects of filters are estimated. The main goal is to define appropriate filters for a tracing run from a profile.

The general work-flow for performance analysis with Score-P is:

  1. Instrument an application
  2. Perform a measurement run and record a profile. The profile already gives an overview what may happen inside the application.
  3. Use scorep-score to define an appropriate filter for an application - otherwise the trace file may become too large.
  4. Perform a measurement run with tracing enabled and the filter applied.
  5. Perform in-depth analysis on the trace data.

To invoke scorep-score you must provide the filename of a CUBE4 profile as argument. Thus, the basic command looks like this:

scorep-score profile.cubex

The output of the command gives a short overview on the profile gained - especially an estimation of the total size of the trace, aggregated over all processes. This information is useful for estimating the space required on disk for a subsequent run with tracing enabled.

Furthermore, an estimation of the memory space required by a single process for the trace is shown. The memory space that Score-P reserves on each process at application start must be large enough to hold the process' trace in memory in order to avoid flushes during runtime, because flushes heavily disturb measurements. In addition to the trace, Score-P requires some additional memory to maintain internal data structures. Thus, it provides also an estimation for the total amount of required memory on each process. The memory size per process that Score-P reserves is set via the environment variable SCOREP_TOTAL_MEMORY

Finally, scorep-score prints a table that show how the trace memory requirements and the runtime is distributed among certain function groups. The column max_tbc shows how much trace buffer is needed on a single process. The column time(s) shows how much execution time was spend in regions of that group in seconds, the column % shows the fraction of the overall runtime that was used by this group, and the column time/visit(us) shows the average time per visit in microseconds.

The following groups exist:

  • ALL: Includes all functions of the application
  • OMP: This group contains all regions that represent an OpenMP construct
  • MPI: This group contains all MPI functions
  • COM: This group contains all functions, implemented by the user that appear on a call-path to an OpenMP construct, or MPI/SHMEM function
  • USR: This group contains all user functions that do not appear on a call-path to an OpenMP construct, or MPI/SHMEM function.

For a more detailed output, which shows the data for every region, you can use the -r option

scorep-score -r profile.cubex

This command adds information about the used buffer sizes and execution time of every region to the table. Such an information is very valuable when defining a filter, which is recommended to exclude short frequently called functions from measurement in order to lower the measurement overhead.

The filter definition file can contain two blocks:

  • Source file name filter block: enclosed by SCOREP_FILE_NAMES_BEGIN and SCOREP_FILE_NAMES_END. In between you can specify an arbitrary number of include and exclude rules which are evaluated in sequential order. At the beginning all source files are included. Source files that are excluded after all rules are evaluated, are filtered.
  • Region name filer block: enclosed by SCOREP_REGION_NAMES_BEGIN and SCOREP_REGION_NAMES_END. In between you can specify an arbitrary number of include and exclude rules which are evaluated in sequential order. At the beginning, all regions are included. Regions that are excluded after all rules are evaluated, are filtered.

Beside the two filter blocks, you may use comments in the filter definition file. Comments start with the character '#' and is terminated by a new line. Example:

SCOREP_FILE_NAMES_BEGIN # This is a comment
  EXCLUDE */foo*
  INCLUDE */bar.c

  EXCLUDE short*
  INCLUDE main

If you have a filter file, you can test the effect of your filter on the trace file. Therefore, you need to pass the option -f followed by the file name of your filter. E.g. if your filter file name is myfilter, the command looks like this:

scorep-score profile.cubex -f myfilter

To activate a filter for a trace collection, set the environment variable SCOREP_FILTERING_FILE to the file you created. If no filter definition file is specified, all instrumented regions are recorded. For filtered regions, the enter/exit events are not recorded in trace and profile.


The generation of trace log files for the Vampir performance visualization tool requires a working monitoring system to be attached to your parallel program - we recommend the use of Score-P as described above.

Getting started

  • load the latest version of Vampir via module and start the GUI by calling vampir
  • open an arbitrary trace file, click on Open Other... or select Open... in the File menu
  • While Vampir is loading the trace file, an empty Trace View window with a progress bar at the bottom opens. After Vampir loaded the trace data completely, a default set of charts will appear. The loading process can be interrupted at any time by clicking the cancel button in the lower right corner of the Trace View. Because events in the trace file are loaded one after another, the GUI will open and show the earliest, already loaded information from the trace file.

For any details on how to work with the Vampir GUI, please visit the Vampir tutorial.


As an alternative to instrument your code using Score-P, you might use Extrae to get traces of your application. Extrae works by symbol substitution through LD_PRELOAD, hence there is no need to recompile or relink your code.

Getting started

  • modify your batch script and set the following before the srun command (this example is just valid for pure MPI code written in C)
export EXTRAE_HOME=<see docu below>
source ${EXTRAE_HOME}/etc/
cp ${EXTRAE_HOME}/share/example/MPI/extrae.xml .
export EXTRAE_CONFIG_FILE=extrae.xml
  • (optional) tune the copied standard Extrae XML configuration file - other examples can be found at ${EXTRAE_HOME}/share/example
  • submit the batch script
  • once the job finished, you will have the trace in 3 files (*.pcf, *.prv, *.row)
  • analyse the trace using Paraver

For further reference check the Extrae User Guide at ${EXTRAE_HOME}/share/doc or the online documentation here:

The setting of EXTRAE_HOME depends on your used MPI version. We currently have the following versions installed on mistral:

  • /sw/rhel6-x64/analysis-tools/extrae-3.4.1-impi5-intel14 (for module intelmpi/
  • /sw/rhel6-x64/analysis-tools/extrae-3.4.1-impi2017-intel14 (for module intelmpi/2017.1.132)
  • /sw/rhel6-x64/analysis-tools/extrae-3.4.1-bullxmpi-intel14 (for module bullxmpi_mlx/bullxmpi_mlx-

Choose the tracing library to be preloaded depending on the application type

library serial MPI OpenMP

The suffix 'f' is used in Fortran codes.