Info
Alle Inhalte des Nutzerportal sind nur auf Englisch verfügbar.
Sie sind hier: Startseite / Services / Code Tuning / Collect Performance Data / Intel Tools

Intel Tools

Intel toolchain comprises VTune Amplifier, Advisor and Inspector.

Intel VTune Amplifier

Intel® VTune™ Amplifier XE is a Performance profiler. It should be used to analyse the algorithm choices, find serial and parallel code bottlenecks, understand where and how your application can benefit from available hardware resources, and speed up the execution.

Step 1: Start the VTune Amplifier

  1. Build your target application in the Release mode with all optimizations enabled.
  2. Set up the environment variables:

    • module add inteltools

  3. Launch the VTune Amplifier:
    • For standalone GUI interface, run the amplxe-gui command.

    • For command line interface, run the amplxe-cl command.

    For the analysis of your application running on mistral, we recommend using the command line interface in combination with SLURM's multiple program configuration (MPMD). Since most of the VTune analysis is just node based, it makes sense to analyse only one tasks or all task on one node.

  4. To use the multiple program configuration create a suitable config file and modify the srun command like
cat > vtune.conf <<EOF
0 amplxe-cl -c [analysis type] -r vtune-results [vtune options] -- ./myapp
1-N ./myapp
EOF

srun [any slurm options] --multi-prog vtune.conf

The analysis types that can be collected are:

  • hotspots: identifies the most time-consuming source code.
  • advanced-hotspots: as before but uses the VTune Amplifier kernel driver to extend the hotspot analysis by collecting call stacks, context switches and statistical call count data as well as analysing the CPI metric (cycles per instruction).
  • concurrency: usage of available logical CPUs, discovers where parallelism is incurring synchronisation overhead and identifies potential candidates for parallelisation.
  • locksandwaits: identifies where the application is waiting on synchronisation objects or I/O operations.
  • general-exploration: uses hardware event-based sampling to analyse general issues affecting the performance of the application.
  • memory-access: measures a set of metrics to identify memory access related issues.

Further options to amplxe-cl you might use are:

  • -trace-mpi : Configure collectors to trace MPI code, and determine MPI rank IDs in case of a non-Intel MPI library implementation.
  • -data-limit=0 : Limit the amount of raw data to be collected by setting the maximum possible result size (in MB). VTune Amplifier starts collecting data from the beginning of the target execution and ends when the limit for the result size is reached. For unlimited data size, specify 0.
  • -call-stack-mode=all : Choose how to show system functions in the stack.
  • -target-duration-type=long : Estimate the application duration time. This value affects the size of collected data. For long running targets, sampling interval is increased to reduce the result size. For hardware event-based analysis types, the duration estimate affects a multiplier applied to the configured Sample after value.

Step 2: Set Up the Analysis Target (only if using the GUI)

  1. Create a VTune Amplifier project:

    1. Click the null menu button in the right corner and go to New > Project... .

    2. Specify the project name and location in the Create Project dialog box.

  2. In the Analysis Target tab, select a target system from the left pane - just use 'local' on mistralpp for very small, serial tests. Otherwise, use the command line interface and submit your analysis job to the queue.

  3. Select the Analysis Type from the according tab - if you are using the command line interface (recommended), you have to specify the analysis target type after the '-collect' option.

  4. Configure your target: application location, parameters, and search directories (if required).

Step 3: View and Analyse Performance Data

If you are using the GUI, click the Start button on the right to launch the analysis.

If you used the command line interface, you should end up with some directories (one per compute node) containing the analysis results and labelled vtune-results (or according to the -r option given).

Start your analysis with the Summary window to get an overview of the application performance and then switch to other windows to explore the performance deeper at the granularity of function, source line and so on.

Please also have a look at the Intel VTune tutorials: https://software.intel.com/en-us/articles/intel-vtune-amplifier-tutorials


 

Intel Advisor

Intel® Advisor offers a vectorization analysis tool and a threading design and prototyping tool to help ensure your Fortran, C and C++ applications take full performance advantage of today’s processors.

Step 1: Prerequisites

  1. Build your target application in the Release mode with all optimizations enabled, -O2 or higher
    1. Request full debug information (compiler and linker): -g
    2. Produce compiler diagnostics: -qopt-report=5
    3. Enable vectorization: -vec
    4. Enable SIMD directives: -simd
    5. Enable generation of multi-threaded code based on OpenMP* directives if applicable: -qopenmp
  2. Set up the environment variables:

    • module add inteltools

  3. Launch the Intel Advisor
    • For standalone GUI interface, run the advixe-gui command.

    • For command line interface, run the advixe-cl command.

    For the analysis of your application running on mistral, we recommend using the command line interface in combination with SLURM's multiple program configuration (MPMD).

  4. To use the multiple program configuration create a suitable config file and modify the srun command like
    cat > advisor.conf <<EOF
    0 advixe-cl --collect [analysis type] --project-dir advisor-results -- ./myapp
    1-N ./myapp
    EOF

    srun [any slurm options] --multi-prog advisor.conf

The analysis types that can be collected are:

  • survey: Explore where to add efficient vectorization and/or threading.
  • dependencies: Identify and explore loop-carried dependencies for marked loops.
  • map: Identify and explore complex memory accesses for marked loops.
  • suitability: Analyze the annotated program to check its predicted parallel performance.

Further options to advixe-cl you might use are:

  • -trace-mpi : Configure collectors to trace MPI code, and determine MPI rank IDs in case of a non-Intel MPI library implementation.
  • -data-limit=0 : Limit the amount of raw data to be collected by setting the maximum possible result size (in MB). VTune Amplifier starts collecting data from the beginning of the target execution and ends when the limit for the result size is reached. For unlimited data size, specify 0.

Step 2: Run Survey Analysis

  1. if you are using the Command Line Interface: just submit your SLURM batch job as normal
  2. if you are using the GUI: Under Survey Target in the VECTORIZATION WORKFLOW, click the Run control control to collect Survey data while your application executes.

CAUTION: perform the interactive survey analysis only on mistralpp and only for very small application settings!

Step 3: View and Analyse the Data

After your batch job running the advixe-cl Command Line Interface finalized successfully, you will have the results reported in the project subdirectory. Use the advixe-gui to open the result file and start analysing.

Please refer to the Intel Advisor Getting Started Guide: https://software.intel.com/en-us/get-started-with-advisor-vectorization-linux


 

Intel Inspector

Intel® Inspector is a dynamic memory and threading error checking tool for users developing serial and multithreaded applications. It offers a standalone GUI and command line operational environments. Key features are:

  • A wealth of reported memory errors, including on-demand memory leak detection
  • Memory growth measurement to help ensure your application uses no more memory than expected
  • Data race, deadlock, lock hierarchy violation, and cross-thread stack access error detection, including error detection on the stack

Step 1: Prerequisites

To build applications that produce the most accurate and complete Intel Inspector analysis results:

  1. Build your application in debug mode.
    • Use optimal compiler/linker settings. For more information, see: Building Applications in Intel Inspector Help.
    • Ensure your application creates more than one thread before you run threading analyses.
    • Verify your application runs outside the Intel Inspector environment
  • Set up the environment variables:

    module add inteltools
  • Launch the Intel Inspector
    • For standalone GUI interface, run the advixe-gui command.

    • For command line interface, run the advixe-cl command.

    For the analysis of your application running on mistral, we recommend using the command line interface in combination with SLURM's multiple program configuration (MPMD). To check for memory leaks e.g. modify your batch script as follows

  • cat > inspector.conf <<EOF
    0 inspxe-cl -c mi3 -trace-mpi -r inspector-results -- ./myapp
    1-N ./myapp
    EOF

    srun [any slurm options] --multi-prog inspector.conf

    Step 2: Run Analysis

    1. if you are using the Command Line Interface: just submit your SLURM batch job as normal
    2. if you are using the GUI: you need to choose/create a project, configure the project and the targetted analysis. Finally, start the analysis.

    CAUTION: perform the interactive analysis only on mistralpp and only for very small application settings!

    Step 3: View and Analyse the Data

    After your batch job running the inspxe-cl Command Line Interface finalized successfully, you will have the results reported in the project subdirectory - a short summary is given in 'inspxe-cl.txt' while the full results are in '*.inspxe'. Use the inspxe-gui to open the result file and start analysing.

    Please refer to the Intel Inspector Getting Started Guide: https://software.intel.com/en-us/node/595380

    Artikelaktionen