Intel® Advisor is available as part of the following suites:

It is composed of two tools to help ensure your Fortran, C and C++ applications realize full performance potential on modern processors, such as Intel® Xeon® and Intel® Xeon Phi™ processors:

This document summarizes typical workflows to get started using the Intel Advisor:

Intel Advisor: Typical Workflows to Get Started

Before You Begin

Before you begin with the Intel Advisor:

To Do This

For This Tool

Optimal C/C++ Settings

Request full debug information (compiler and linker).

Vectorization Advisor

Threading Advisor

Linux* OS command line: -g

Windows* OS command line:

  • /ZI

  • /DEBUG

Microsoft Visual Studio* IDE:

  • C/C++ > General > Debug Information Format > Program Database (/Zi)

  • Linker > Debugging > Generate Debug Info > Yes (/DEBUG)

Request moderate optimization.

Vectorization Advisor

Threading Advisor

Linux* OS command line: -O2 or higher

Windows* OS command line:

  • /O2 or higher

  • /Ob1 (Threading Advisor only)

Visual Studio* IDE:

  • C/C++ > Optimization > Optimization > Maximize Speed (/O2) or higher

  • C/C++ > Optimization > Inline Function Expansion > Only_inline (/Ob1) (Threading Advisor only)

Produce compiler diagnostics (necessary for version 15.0 of the Intel compiler; unnecessary for version 16.0 and higher).

Vectorization Advisor only

Linux* OS command line: -qopt-report=5

Windows* OS command line: /Qopt-report:5

Visual Studio* IDE: C/C++ > Diagnostics [Intel C++] > Optimization Diagnostic Level > Level 5 (/Qopt-report:5)

Enable vectorization.

Vectorization Advisor only

Linux* OS command line: -vec

Windows* OS command line: /Qvec

Enable SIMD directives.

Vectorization Advisor only

Linux command line: -simd

Windows* OS command line: /Qsimd

Enable generation of multi-threaded code based on OpenMP* directives.

Vectorization Advisor only

Linux* OS command line: -qopenmp

Windows* OS command line: /Qopenmp

Visual Studio* IDE: C/C++ > Language [Intel C++] > OpenMP Support > Generate Parallel Code (/Qopenmp)

Search additional directory related to Intel Advisor annotation definitions.

Primarily Threading Advisor, but could also be useful for Vectorization Advisor refinement analyses

Linux* OS command line: - I${ADVISOR_[product_year]_DIR}/include

Windows* OS command line: /I"%ADVISOR_[product_year]_DIR%"\include

Visual Studio* IDE: C/C++ > General > Additional Include Directories > $(ADVISOR_[product_year]_DIR)\include;%(AdditionalIncludeDirectories)

Search for unresolved references in multithreaded, dynamically linked libraries.

Threading Advisor only

Linux* OS command line: -Bdynamic

Windows* OS command line: /MD or /MDd

Visual Studio* IDE: C/C++ > Code Generation > Runtime Library > Mutithread

Enable dynamic loading.

Threading Advisor only

Linux* OS command line: -ldl

To Do This

For This Tool

Optimal Fortran Settings

Request full debug information (compiler and linker).

Vectorization Advisor

Threading Advisor

Linux* OS command line: -g

Windows* OS command line:

  • /debug=full

  • /DEBUG

Visual Studio* IDE:

  • Fortran > General > Debug Information Format > Full (/debug=full)

  • Linker > Debugging > Generate Debug Info > Yes (/DEBUG)

Request moderate optimization.

Vectorization Advisor

Threading Advisor

Linux* OS command line: -O2 or higher

Windows* OS command line:

  • /O2 or higher

  • /Ob1 (Threading Advisor only)

Visual Studio* IDE:

  • Fortran > Optimization > Optimization > Maximize Speed or higher

  • Fortran > Optimization > Inline Function Expansion > Only INLINE directive (/Ob1) (Threading Advisor only)

Produce compiler diagnostics (necessary for version 15.0 of the Intel compiler; unnecessary for version 16.0 and higher).

Vectorization Advisor only

Linux* OS command line: -qopt-report=5

Windows* OS command line: /Qopt-report:5

Visual Studio* IDE: Fortran > Diagnostics > Optimization Diagnostic Level > Level 5 (/Qopt-report:5)

Enable vectorization.

Vectorization Advisor only

Linux* OS command line: -vec

Windows* OS command line: /Qvec

Enable SIMD directives.

Vectorization Advisor only

Linux* OS command line: -simd

Windows* OS command line: /Qsimd

Enable generation of multi-threaded code based on OpenMP* directives.

Vectorization Advisor only

Linux* OS command line: -qopenmp

Visual Studio* IDE: Fortran > Language > Process OpenMP Directives > Generate Parallel Code (/Qopenmp)

Search additional directory related to Intel Advisor annotation definitions.

Primarily Threading Advisor, but could also be useful for Vectorization Advisor refinement analyses

Linux* OS command line:

  • -I${ADVISOR_[product_year]_DIR}/include/ia32 or -I${ADVISOR_[product_year]_DIR}/include/ia64

  • -L${ADVISOR_[product_year]_DIR}/lib32 or -L${ADVISOR_[product_year]_DIR}/lib64

  • -ladvisor

Windows* OS command line:

  • /I"%ADVISOR_[product_year]_DIR%"\include\ia32 or /I"%ADVISOR_[product_year]_DIR%"\include\ia64

  • /L"%ADVISOR_[product_year]_DIR%"\lib32 or /L"%ADVISOR_[product_year]_DIR%"\lib64

  • /ladvisor or

Visual Studio* IDE:

  • Fortran > General > Additional Include Directories > "$(ADVISOR_[product_year]_DIR)\include\ia32\" or "$(ADVISOR_[product_year]_DIR)\include\ia64\"

  • Linker > General > Additional Library Directories > "$(ADVISOR_[product_year]_DIR)\lib32" or "$(ADVISOR_[product_year]_DIR)\lib64"

  • Linker > Input > Additional Dependencies > .lib > libadvisor

Search for unresolved references in multithreaded, dynamically linked libraries.

Threading Advisor only

Linux* OS command line: -shared-intel

Windows* OS command line: /MD or /MDd

Visual Studio* IDE: Fortran > Libraries > Runtime Librarary > Multithread DLL (/libs:dll /threads) or Debug Multithread DLL (/libs:dll /threads /dbglibs)

Enable dynamic loading.

Threading Advisor only

Linux* OS command line: -ldl

Discover Where Vectorization Will Pay Off the Most

This section shows how to get started using only the Intel Advisor Survey analysis. The main advantage of using this single-analysis Vectorization Advisor workflow is low runtime overhead. The main disadvantage is it may not provide enough data to help you make improvement decisions; you may need to dig deeper using another workflow.

Intel Advisor Workflow: Discover Where Vectorization Will Pay Off the Most

Survey Report - Offers integrated compiler report data and performance data that shows where vectorization will pay off the most; if vectorized loops are providing benefit, and if not, why not; un-vectorized loops and why they are not vectorized; and performance problems in general.

Set Up Environment

Environment

Set-Up Tasks

Intel® Parallel Studio XE/Linux* OS

  • Do one of the following:

    • Run one of the following source commands:

      • For csh/tcsh users: source <advisor-install-dir>/advixe-vars.csh

      • For bash users: source <advisor-install-dir>/advixe-vars.sh

      The default installation path, <advisor-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

    • Add <advisor-install-dir>/bin32 or <advisor-install-dir>/bin64 to your path.

    • Run the <parallel-studio-install-dir>/psxevars.csh or <parallel-studio-install-dir>/psxevars.sh command. The default installation path, <parallel-studio-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

  • Set the VISUAL or EDITOR environment variable to identify the external editor to launch when you double-click a line in an Intel Advisor source window. (VISUAL takes precedence over EDITOR.)

  • Set the BROWSER environment variable to identify the installed browser to display Intel Advisor documentation.

  • If you are using Intel® Threading Building Blocks (Intel® TBB), set the TBBROOT environment variable so your compiler can locate the installed Intel TBB include directory.

  • Make sure you run your application in the same Linux* OS environment as the Intel Advisor.

Intel Parallel Studio XE/Windows* OS

Note:

Setting up the Windows* OS environment is necessary only if you plan to use the advixe-cl command to run the command line interface, or choose to use the advixe-gui command to launch the Intel Advisor standalone GUI instead of using available GUI or IDE launch options.

Do one of the following:

  • Run the <advisor-install-dir>\advixe-vars.bat command.

    The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

  • Run the <parallel-studio-install-dir>\psxevars.bat command.

    The default installation path, <parallel-studio-install-dir>, is below C:\Program Files (x86)\IntelSWTools\.

Intel® System Studio

Note:

Setting up the environment is necessary only if you plan to use the advixe-cl command to run the command line interface, or choose to use the advixe-gui command to launch the Intel Advisor standalone GUI instead of using available GUI or IDE launch options.

Run the <advisor-install-dir>\advixe-vars.bat command to set up your environment. The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

Launch Intel Advisor and Create a Project

To launch the:

  • Intel Parallel Studio XE/Intel Advisor standalone GUI:

    • In the Linux* OS: Run the advixe-gui command.

    • In the Windows* OS: From the Microsoft Windows* All Apps screen, select Intel Parallel Studio XE 201n > Intel Advisor 201n

  • Intel System Studio/Intel Advisor standalone GUI: Choose Tools > Intel Advisor > Launch Intel Advisor from the IDE menu.

  • Intel Advisor plug-in to the Visual Studio* IDE: Open your solution in the Visual Studio* IDE.

To create an Intel Advisor project:

  1. Do one of the following

    • In the standalone GUI: Choose File > New > Project… to open the Create a Project dialog box. Supply a name and location for your project, then click the Create Project button to open the Project Properties dialog box.

    • In the Visual Studio* IDE: Choose Project > Intel Advisor 201n Project Properties... to open the Project Properties dialog box.

  2. On the left side of the Analysis Target tab, ensure the Survey Hotspots Analysis type is selected, then set appropriate parameters. (Setting the binary/symbol search and source search directories is optional for the Vectorization Advisor.)

Run Survey Analysis

Intel Advisor Vectorization Workflow Tab: Survey Target

Under Survey Target in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to collect Survey data while your application executes. Upon completion the Intel Advisor displays a Survey Report similar to the following.

Note:

If the Workflow is not displayed in the Visual Studio IDE: Click the Intel Advisor toolbar icon icon on the Intel Advisor toolbar.


Intel Advisor: Survey Report
There are many controls available to help you focus on the data most important to you, including the following:

1

Click the button to save a read-only result snapshot you can view any time.

Intel Advisor stores only the most recent analysis result. Visually comparing one or more snapshots to each other or to the most recent analysis result can be an effective way to judge performance improvement progress.

To open a snapshot, choose File > Open > Result...

2

Click the various Filter buttons and drop-down lists to temporarily limit displayed data based on your criteria.

3

Click the button to view loops in non-executed code paths for various instruction set architectures (ISA). Prerequisites:

  • Compile the target application for multiple code paths using the Intel compiler.

  • Enable the Analyze loops in not executed code path checkbox in Project Properties > Analysis Target > Survey Hotspots Analysis.

4

Click the toggle to simplify data representation and automatically select suitable and/or high-impact loops from a SIMD vector performance perspective.

Smart mode uses loop call tree nesting (Loop Height column), fraction of Total CPU Time (which you can adjust using the Loops Above control), and other criteria to automatically filter and sort loops of interest.

5

Click the button to search for specific data.

6

Click the tab to open various Intel Advisor reports or views.

7

Click the toggle to show/hide sets of columns.

8

Click the control to show/hide a chart that helps you visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity), thereby providing an ideal roadmap of potential optimization steps.

9

Click a data row in the top of the Survey Report to display more data specific to that row in the bottom of the Survey Report. Double-click a loop data row to display a Survey Source window.

10

Click a checkbox to mark a loop for deeper analysis.

11

If present, click the image to display code-specific how-can-I-fix-this-issue? information in the Recommendations pane.

12

If present, click the image to view the reason automatic vectorization failed in the Why No Vectorization? pane.

13

Click the control to show/hide the Workflow pane.

Investigate Loops

If all loops are vectorizing properly and performance is satisfactory, you are done! Congratulations!

If one or more loops is not vectorizing properly and performance is unsatisfactory:

  1. Improve application performance using various Intel Advisor features to guide your efforts, such as:

    • Information in the Intel Advisor control: RecommendationsPerformance Issues column and associated Intel Advisor control: RecommendationsRecommendations tab
      Intel Advisor: Recommendations

      Table of contents on right, showing recommendations for each issue relevant to the loop. Expandable/collapsible recommendations on left (some reference details specific to the analyzed loop, such as vector length or trip count). Number of bars on recommendation icon shows confidence this recommendation is the appropriate fix.
    • Suggestions in What Do I Do After Running a Survey Analysis? in the Intel Advisor User Guide

    • Optional Dependencies and Memory Access Patterns (MAP) analyses to help you dig deeper

  2. Rebuild your modified code.

  3. Run another Survey analysis to verify all loops are vectorizing properly and performance is satisfactory.

Identify Performance Bottlenecks Using Roofline

This section shows how to get started using all Vectorization Advisor analyses, starting with the Roofline analysis. The main advantage of using this multi-analysis Vectorization Advisor workflow is the potential to generate an ideal roadmap of optimization steps. The main disadvantage is high runtime overhead. For example:

Intel Advisor Typical Workflow: Identify Performance Bottlenecks Using Roofline

Roofline analysis - Helps visualize actual performance against hardware-imposed performance ceilings, as well as determine the main limiting factor (memory bandwidth or compute capacity). When you run a Roofline analysis, the Intel Advisor:

Dependencies analysis - Checks for real data dependencies in loops the compiler did not vectorize because of assumed dependencies.

Memory Access Patterns (MAP) analysis - Checks for various memory issues, such as non-contiguous memory accesses and unit stride vs. non-unit stride accesses.

Set Up Environment

Environment

Set-Up Tasks

Intel® Parallel Studio XE/Linux* OS

  • Do one of the following:

    • Run one of the following source commands:

      • For csh/tcsh users: source <advisor-install-dir>/advixe-vars.csh

      • For bash users: source <advisor-install-dir>/advixe-vars.sh

      The default installation path, <advisor-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

    • Add <advisor-install-dir>/bin32 or <advisor-install-dir>/bin64 to your path.

    • Run the <parallel-studio-install-dir>/psxevars.csh or <parallel-studio-install-dir>/psxevars.sh command. The default installation path, <parallel-studio-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

  • Set the VISUAL or EDITOR environment variable to identify the external editor to launch when you double-click a line in an Intel Advisor source window. (VISUAL takes precedence over EDITOR.)

  • Set the BROWSER environment variable to identify the installed browser to display Intel Advisor documentation.

  • If you are using Intel® Threading Building Blocks (Intel® TBB), set the TBBROOT environment variable so your compiler can locate the installed Intel TBB include directory.

  • Make sure you run your application in the same Linux* OS environment as the Intel Advisor.

Intel Parallel Studio XE/Windows* OS

Note:

Setting up the Windows* OS environment is necessary only if you plan to use the advixe-cl command to run the command line interface, or choose to use the advixe-gui command to launch the Intel Advisor standalone GUI instead of using available GUI or IDE launch options.

Do one of the following:

  • Run the <advisor-install-dir>\advixe-vars.bat command.

    The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

  • Run the <parallel-studio-install-dir>\psxevars.bat command.

    The default installation path, <parallel-studio-install-dir>, is below C:\Program Files (x86)\IntelSWTools\.

Intel® System Studio

Note:

Setting up the environment is necessary only if you plan to use the advixe-cl command to run the command line interface, or choose to use the advixe-gui command to launch the Intel Advisor standalone GUI instead of using available GUI or IDE launch options.

Run the <advisor-install-dir>\advixe-vars.bat command to set up your environment. The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

Launch Intel Advisor and Create a Project

To launch the:

  • Intel Parallel Studio XE/Intel Advisor standalone GUI:

    • In the Linux* OS: Run the advixe-gui command.

    • In the Windows* OS: From the Microsoft Windows* All Apps screen, select Intel Parallel Studio XE 201n > Intel Advisor 201n

  • Intel System Studio/Intel Advisor standalone GUI: Choose Tools > Intel Advisor > Launch Intel Advisor from the IDE menu.

  • Intel Advisor plug-in to the Visual Studio* IDE: Open your solution in the Visual Studio* IDE.

To create an Intel Advisor project:

  1. Do one of the following

    • In the standalone GUI: Choose File > New > Project… to open the Create a Project dialog box. Supply a name and location for your project, then click the Create Project button to open the Project Properties dialog box.

    • In the Visual Studio* IDE: Choose Project > Intel Advisor 201n Project Properties... to open the Project Properties dialog box.

  2. On the left side of the Analysis Target tab, ensure the Survey Hotspots Analysis type is selected and set appropriate parameters.

  3. Set appropriate parameters for other analysis types and tabs. (Setting the binary/symbol search and source search directories is optional for the Vectorization Advisor.)

Tip:
  • If possible, use the Inherit settings from Survey Hotspots Analysis Type checkbox for other analysis types.

  • The Trip Counts and FLOP Analysis type has similar parameters to the Survey Hotspots Analysis type.

  • The Dependencies Analysis and Memory Access Patterns Analysis types consume more resources than the Survey Hotspots Analysis type. If these Refinement analyses take too long, consider decreasing the workload.

  • Select Track stack variables in the Dependencies Analysis type to detect all possible dependencies.

Run Roofline Analysis

Intel Advisor Vectorization Workflow Tab: Run Roofline

Under Run Roofline in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to execute your target application. Upon completion, the Intel Advisor displays a Roofline chart.

Note:

If the Workflow is not displayed in the Visual Studio IDE: Click the Intel Advisor toolbar icon icon on the Intel Advisor toolbar.

The Roofline chart plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance:

  • Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) or integer operations (INTOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory

  • Performance (y axis) - measured in billions of floating-point operations per second (GFLOPS) or billions of integer operations per second (INTOPS)

In general:

  • The size and color of each Roofline chart dot represent relative execution time for each loop/function. Large red dots take the most time, so are the best candidates for optimization. Small green dots take less time, so may not be worth optimizing.

  • Roofline chart diagonal lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The L1 Bandwidth roofline represents the maximum amount of work that can get done at a given arithmetic intensity if the loop always hits L1 cache. A loop does not benefit from L1 cache speed if a dataset causes it to miss L1 cache too often, and instead is subject to the limitations of the lower-speed L2 cache it is hitting. So the dot representing the loop is positioned somewhere below the L2 Bandwidth roofline.

  • Roofline chart horizontal lines indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The Scalar Add Peak represents the peak number of add instructions that can be performed by the scalar loop under these circumstances. The Vector Add Peak represents the peak number of add instructions that can be performed by the vectorized loop under these circumstances. If a loop is not vectorized, the dot representing the loop is positioned somewhere below the Scalar Add Peak roofline.

  • A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.

  • The greater the distance between a dot and the highest achievable roofline, the more opportunity exists for performance improvement.

In the following Roofline chart representation, loops A and G (large red dots), and to a lesser extent B (yellow dot far below the roofs), are the best candidates for optimization. Loops C, D, and E (small green dots) and H (yellow dot) are poor candidates because they do not have much room to improve.
This is a visual model, not an actual screenshot, of the Roofline Chart

There are several controls to help you show/hide the Roofline chart:
Intel Advisor: Roofline Chart & Survey Report

1

Click to toggle between Roofline chart view and Survey Report view.

2

Click to toggle to and from side-by-side Roofline chart and Survey Report view.

3

Drag to adjust the dimensions of the Roofline chart and Survey Report.

There are several controls to help you focus on the data most important to you, including the following.
Intel Advisor: Roofline controls

1

  • Zoom in and out by tracing a rectangle with your mouse. You can also zoom in and out using your mouse wheel.

  • Move the chart left, right, up, and down.

  • Undo or redo the previous zoom action.

  • Reset to the default zoom level.

  • Copy the chart to the clipboard, save it to a file, export it as an interactive HTML file that does not require the Intel Advisor viewer to display, or save it for comparison. Use the arrow to toggle among the options.

2

  • Adjust rooflines to see practical performance limits if an application uses fewer threads than available cores.

  • Build roofs for single-threaded applications (or for multi-threaded applications configured to run single threaded, such as one thread-per-rank for MPI applications. (You can use Intel Advisor filters to control the loops displayed in the Roofline chart; however, the Roofline chart does not support the Threads filter.)

3

  • Toggle the display between floating-point, integer operations, and mixed operations (floating-point and integer).

  • Display data for whole program stacks and different code paths leading to different representations of the same loops. (By default, the Roofline chart displays data only for loops and functions in the Loop Information pane of the Survey Report.) Note: Requires enabling the Enable Roofline with Callstacks checkbox before running the Roofline analysis.

4

Choose a result or snapshot file to load and compare with the current result on the same Roofline chart (with the same peaks). Use the accompanying drop-down to select and distinguish among different result and snapshot files.

5

  • Change the visibility and appearance of roofline representations (lines).

  • Change the appearance of loop weight representations (dots).

  • Manually fine-tune roof values to set hardware limits specific to your code.

6

Zoom in and out using numerical values.

7

Hover your mouse over an item to display metrics for that item.

If you hover your mouse over a dot, the Roofline chart displays two blue dots with metrics that show potential performance if you optimize the loop to reach the next roofline and the maximum achievable roofline. (If the next roofline and maximum achievable roofline are the same, the Roofline chart displays only one blue dot.)

Click a dot to outline it in black and display corresponding code and metrics in other window tabs.

After clicking a dot, right-click a blank area in the Roofline chart and choose:

  • Filter Out to temporarily hide the dot

  • Filter In to temporarily hide all other dots

  • Clear Filters to show all originally displayed dots

8

Display the number and percentage of loops in each loop weight representation category.

Investigate Loops

If all loops are vectorizing properly and performance is satisfactory, you are done! Congratulations!

If one or more loops is not vectorizing properly and performance is unsatisfactory:

  1. Check data in associated Intel Advisor views to support your Roofline chart interpretation. For example: Check the Vectorized Loops/Efficiency values in the Survey Report or the data in the Code Analytics tab.

  2. Improve application performance using various Intel Advisor features to guide your efforts, such as:

    • Information in the Intel Advisor control: RecommendationsPerformance Issues column and associated Intel Advisor control: RecommendationsRecommendations tab
      Intel Advisor Recommendations

      Table of contents on right, showing recommendations for each issue relevant to the loop. Expandable/collapsible recommendations on left (some reference details specific to the analyzed loop, such as vector length or trip count). Number of bars on recommendation icon shows confidence this recommendation is the appropriate fix.

    • Information in the Intel Advisor control: Compiler diagnostic detailsWhy No Vectorization? column and associated Intel Advisor control: Compiler diagnostic detailsWhy No Vectorization? tab

    • Suggestions in What Do I Do After Running a Survey Analysis? in the Intel Advisor User Guide

If you need more information, continue your investigation by:

  1. Marking one or more loops/functions for deeper analysis in the column AND

  2. Running a Dependencies analysis to discover why the compiler assumed a dependency and did not vectorize a loop/function, and/or running a Memory Access Patterns (MAP) analysis to identify expensive memory instructions

Run Dependencies Analysis

To run a Dependencies analysis:

  1. Mark one or more un-vectorized loops for deeper analysis in the column in the Survey Report.

  2. Under Check Dependencies in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to collect Dependencies data while your application executes.

After the Intel Advisor collects the data, it displays a Dependencies-focused Refinement Report similar to the following:


Intel Advisor: Dependencies Report
There are many controls available to help you focus on the data most important to you, including the following:

1

To display more information in the Dependencies Report about a loop you selected for deeper analysis: Click the associated data row.

2

To display instruction addresses and code snippets for associated code locations in the Code Locations pane: Click a data row.

To choose a problem of interest to display in the Dependencies Source window: Right click a data row, then choose View Source.

To open your default editor in another tab/window: Right-click a data row, then choose Edit Source to open an editor tab.

3

To choose a code location of interest to display in the Dependencies Source window: Right-click a data row, then choose View Source.

To open your default editor in another tab/window: Right-click a data row, then choose Edit Source to open an editor tab.

4

Use the Filter pane to:

  • Temporarily limit the items displayed in the Problems and Messages pane by clicking filter criteria in one or more filter categories.

  • Deselect filter criteria in one filter category, or deselect filter criteria in all filter categories.

  • Sort all filter criteria by name in ascending alphabetical order or by count in descending numerical order. (You cannot change the order in which filter categories are presented.

5

To populate these columns and the Memory Access Patterns Report with data, run a Memory Access Patterns analysis.

If the Dependencies Report shows:

  • There is no real dependency in the loop for the given workload, follow Intel Advisor guidance to tell the compiler it is safe to vectorize.

  • There is an anti-dependency (often called a Write after read dependency or WAR), follow Intel Advisor guidance to enable vectorization.

Intel Advisor code improvement guidance is available in the Intel Advisor control: RecommendationsRecommendations tab and What Do I Do After Running a Survey Analysis? in the Intel Advisor User Guide. After you finish improving your code:

  1. Run a Memory Access Patterns (MAP) analysis if desired.

  2. Rebuild your modified code.

  3. Run another Roofline analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Run Memory Access Patterns (MAP) Analysis

To run a Memory Access Patterns (MAP) analysis:

  1. Mark one or more un-vectorized loops for deeper analysis in the column in the Survey Report.

  2. Under Check Memory Access Patterns in the Vectorization Workflow, click the Intel Advisor control: Run analysis control to collect MAP data while your application executes.

After the Intel Advisor collects the data, it displays a MAP-focused Refinement Report similar to the following:

Intel Advisor: Memory Access Patterns (MAP) Report

Intel Advisor code improvement guidance is available in the Intel Advisor control: RecommendationsRecommendations tab and What Do I Do After Running a Survey Analysis? in the Intel Advisor User Guide. After you finish improving your code:

  1. Rebuild your modified code.

  2. Run another Roofline analysis to verify your application still runs correctly and all test cases pass, all loops are vectorizing properly, and performance is satisfactory.

Prototype Threading Designs

This section shows how to get started using the Threading Advisor. The main advantage of using this multi-analysis Threading Advisor workflow is the potential for what-if modeling with corresponding prediction of data sharing issues. The main disadvantage is medium to high runtime overhead.

Intel Advisor Typical Workflow: Prototype Threading Designs

Survey analysis - Shows the loops and functions where your application spends the most time. Use this information to discover candidates for parallelization with threads.

Annotations - Annotations are subroutine calls or macros (depending on the programming language) you insert to mark places in your application that are good candidates for later replacement with parallel framework code that enables threading parallel execution. Annotations can be processed by your current compiler but do not change the computations of your application.

Suitability analysis - Predicts the maximum speedup of your application based on the inserted annotations and a variety of what-if modeling parameters with which you can experiment. Use this information to choose the best candidates for parallelization with threads.

Dependencies analysis - Predicts parallel data sharing problems based on the inserted annotations. Use this information to fix the data sharing problems if the predicted maximum speedup benefit justifies the effort.

Set Up Environment

Environment

Set-Up Tasks

Intel® Parallel Studio XE/Linux* OS

  • Do one of the following:

    • Run one of the following source commands:

      • For csh/tcsh users: source <advisor-install-dir>/advixe-vars.csh

      • For bash users: source <advisor-install-dir>/advixe-vars.sh

      The default installation path, <advisor-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

    • Add <advisor-install-dir>/bin32 or <advisor-install-dir>/bin64 to your path.

    • Run the <parallel-studio-install-dir>/psxevars.csh or <parallel-studio-install-dir>/psxevars.sh command. The default installation path, <parallel-studio-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

  • Set the VISUAL or EDITOR environment variable to identify the external editor to launch when you double-click a line in an Intel Advisor source window. (VISUAL takes precedence over EDITOR.)

  • Set the BROWSER environment variable to identify the installed browser to display Intel Advisor documentation.

  • If you are using Intel® Threading Building Blocks (Intel® TBB), set the TBBROOT environment variable so your compiler can locate the installed Intel TBB include directory.

  • Make sure you run your application in the same Linux* OS environment as the Intel Advisor.

Intel Parallel Studio XE/Windows* OS

Note:

Setting up the Windows* OS environment is necessary only if you plan to use the advixe-cl command to run the command line interface, or choose to use the advixe-gui command to launch the Intel Advisor standalone GUI instead of using available GUI or IDE launch options.

Do one of the following:

  • Run the <advisor-install-dir>\advixe-vars.bat command.

    The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

  • Run the <parallel-studio-install-dir>\psxevars.bat command.

    The default installation path, <parallel-studio-install-dir>, is below C:\Program Files (x86)\IntelSWTools\.

Intel® System Studio

Note:

Setting up the environment is necessary only if you plan to use the advixe-cl command to run the command line interface, or choose to use the advixe-gui command to launch the Intel Advisor standalone GUI instead of using available GUI or IDE launch options.

Run the <advisor-install-dir>\advixe-vars.bat command to set up your environment. The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

Launch Intel Advisor and Create a Project

To launch the:

  • Intel Parallel Studio XE/Intel Advisor standalone GUI:

    • In the Linux* OS: Run the advixe-gui command.

    • In the Windows* OS: From the Microsoft Windows* All Apps screen, select Intel Parallel Studio XE 201n > Intel Advisor 201n

  • Intel System Studio/Intel Advisor standalone GUI: Choose Tools > Intel Advisor > Launch Intel Advisor from the IDE menu.

  • Intel Advisor plug-in to the Visual Studio* IDE: Open your solution in the Visual Studio* IDE.

To create an Intel Advisor project:

  1. Do one of the following

    • In the standalone GUI: Choose File > New > Project… to open the Create a Project dialog box. Supply a name and location for your project, then click the Create Project button to open the Project Properties dialog box.

    • In the Visual Studio* IDE: Choose Project > Intel Advisor 201n Project Properties... to open the Project Properties dialog box.

  2. On the left side of the Analysis Target tab, ensure the Survey Hotspots Analysis type is selected and set appropriate parameters.

  3. Set appropriate parameters for other analysis types and tabs. (Setting the binary/symbol search and source search directories is required for the Threading Advisor.)

Tip:
  • If possible, use the Inherit settings from Survey Hotspots Analysis Type checkbox for other analysis types.

  • The Dependencies Analysis type consumes more resources than the Survey Hotspots Analysis type. If this analysis takes too long, consider decreasing the workload.

  • Select Track stack variables in the Dependencies Analysis type to detect all possible dependencies.

Run Survey Analysis

Intel Advisor Threading Workflow Tab: Survey Target

Under Survey Target in the Threading Workflow, click the Intel Advisor control: Run analysis control to collect Survey data while your application executes. Use the resulting information to discover candidates for parallelization with threads.

Note:

If the Workflow is not displayed in the Visual Studio IDE: Click the Intel Advisor toolbar icon icon on the Intel Advisor toolbar.

Investigate Loops

Pay particular attention to the hottest loops in terms of Self Time and Total Time. Optimizing these loops provides the most benefit. Outermost loops with significant Total Time are often good candidates for parallelization with threads. Innermost loops and loops near innermost loops are often good candidates for vectorization.

Annotate Sources

Insert annotations to mark places in parts of your application that are good candidates for later replacement with parallel framework code that enables parallel execution.

The main types of Intel Advisor annotations mark the location of:

  • A parallel site. A parallel site is a region of code that contains one or more tasks that may execute in one or more parallel threads to distribute work. An effective parallel site typically contains a hotspot that consumes application execution time. To distribute these frequently executed instructions to different tasks that can run at the same time, the best parallel site is not usually located at the hotspot, but higher in the call tree.

  • One or more parallel tasks within a parallel site. A task is a portion of time-consuming code with data that can be executed in one or more parallel threads to distribute work.

  • Locking synchronization, where mutual exclusion of data access must occur in the parallel application.

The Intel Advisor User Guide offers sample annotated source code you can copy into your editor include:

Annotation Code Snippet

Purpose

Iteration Loop, Single Task

Create a simple loop structure, where the task code includes the entire loop body. This common task structure is useful when only a single task is needed within a parallel site.

Loop, One or More Tasks

Create loops where the task code does not include all of the loop body, or complex loops or code that requires specific task begin-end boundaries, including multiple task end annotations. This structure is also useful when multiple tasks are needed within a parallel site.

Function, One or More Tasks

Create code that calls multiple tasks within a parallel site.

Pause/Resume Collection

Temporarily pause data collection and later resume it, so you can skip uninteresting parts of application execution to minimize collected data and speed up analysis of large applications. Add these annotations outside a parallel site.

Build Settings

Set build (compiler and linker) settings specific to the language in use.

After inserting annotations, rebuild your application in release mode.

Tip:

Choosing where to add task annotations may require some experimentation. If your parallel site has nested loops and the computation time used by the innermost loop is small, consider adding task annotations around the next outermost loop.

Run Suitability Analysis

Under Check Suitability in the Threading Workflow, click the Intel Advisor control: Run analysis control to collect Suitability data while your application executes.

The Suitability Report predicts maximum speedup based on the inserted annotations and what-if modeling parameters with which you can experiment, such as:

  • Different hardware configurations and parallel frameworks

  • Different trip counts and instance durations

  • Any plans to address parallel overhead, lock contention, or task chunking when you implement your parallel framework code

Use the resulting information to choose the best candidates for parallelization with threads.

Run Dependencies Analysis

Under Check Dependencies in the Threading Workflow, click the Intel Advisor control: Run analysis control to collect Dependencies data while your application executes. Use the resulting information to fix the data sharing problems if the predicted maximum speedup benefit justifies the effort.

Improve App Performance

If you decide the predicted maximum speedup benefit is worth the effort to add threading parallelism to your application:

  1. Complete developer/architect design and code reviews about the proposed parallel changes.

  2. Choose one parallel programming framework (threading model) for your application, such as Intel® Threading Building Blocks (Intel® TBB), OpenMP*, Microsoft Task Parallel Library* (TPL), or some other parallel framework.

  3. Add the parallel framework to your build environment.

  4. Add parallel framework code to synchronize access to the shared data resources, such as Intel TBB or OpenMP* locks.

  5. Add parallel framework code to create parallel tasks.

As you add the appropriate parallel code from the chosen parallel framework, you can keep, comment out, or replace the Intel Advisor annotations.

Visualize Intel Advisor Results on macOS* Machines

This section shows how to get started:

  1. Collecting data on a Windows* OS or Linux* OS machine

  2. Viewing the resulting data on a macOS* machine

The main advantage of this Vectorization Advisor and Threading Advisor workflow is you can reap the benefits of the Intel Advisor GUI during the investigatory process even if you must run your code on a dedicated system with limited capabilities for visualization and data manipulation, such as clusters. The main disadvantage is you cannot collect data on a macOS* machine; you can only view data collected on a Windows* OS or Linux* OS machine.

No Shared Drive

Follow these steps if you cannot put the target application (binaries, symbol information, source code, etc.) on a shared drive visible to both the remote and the macOS* machines:

  1. Install only the Intel Advisor GUI on the macOS* machine.

  2. Install the Intel Advisor on the remote machine. You may install the complete tool or a portion of the tool, such as only the CLI if you plan to collect data using only the CLI.

  3. On the remote machine:

    • Build an optimized binary of your application in release mode using settings designed to produce the most accurate and complete analysis results. See the Before You Begin section in this document for detailed settings.

    • Set up the environment, launch the Intel Advisor, create a project, and collect the desired data. Check the other workflow sections in this document for detailed instructions.

    • Do one of the following to create an Intel Advisor snapshot of the collected data:

      Make sure you pack both sources and binaries in a zip archive file.

    • Copy the resulting .advixeexpz file to the macOS* machine.

  4. On the macOS* machine:

    • Set up the environment by running one of the following source commands: source <advisor-install-dir>/advixe-vars.csh or source <advisor-install-dir>/advixe-vars.sh.

    • Launch the Intel Advisor.

    • Open the .advixeexpz file result using the File > Open > Result menu option.

Shared Drive

Follow these steps if you can put the target application (binaries, symbol information, and source code) on a shared drive visible to both the remote and the macOS* machines:

  1. Install only the Intel Advisor GUI on the macOS* machine.

  2. Install the Intel Advisor on the remote machine. You may install the complete tool or a portion of the tool, such as only the CLI if you plan to collect data using only the CLI.

  3. On the shared drive: Build an optimized binary of your application in release mode using settings designed to produce the most accurate and complete analysis results. See the Before You Begin section in this document for detailed settings.

  4. On the remote machine:

    • Set up the environment and launch the Intel Advisor. Check the other workflows in this document for detailed instructions.

    • Set the result location to the shared drive using File > Options > Result Location.

    • Create a project that points to the shared drive and collect the desired data. Check the other workflows in this document for detailed instructions.

  5. On the macOS* machine:

    • Set up the environment by running one of the following source commands: source <advisor-install-dir>/advixe-vars.csh or source <advisor-install-dir>/advixe-vars.sh.

    • Launch the Intel Advisor.

    • Open the .advixeexp file on the shared drive using the File > Open > Result menu option.

Automate Intel Advisor Workflows

Set Up Environment

Environment

Set-Up Tasks

Intel® Parallel Studio XE/Linux* OS

  • Do one of the following:

    • Run one of the following source commands:

      • For csh/tcsh users: source <advisor-install-dir>/advixe-vars.csh

      • For bash users: source <advisor-install-dir>/advixe-vars.sh

      The default installation path, <advisor-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

    • Add <advisor-install-dir>/bin32 or <advisor-install-dir>/bin64 to your path.

    • Run the <parallel-studio-install-dir>/psxevars.csh or <parallel-studio-install-dir>/psxevars.sh command. The default installation path, <parallel-studio-install-dir>, is below:

      • /opt/intel/ for root users

      • $HOME/intel/ for non-root users

  • Make sure you run your application in the same Linux* OS environment as the Intel Advisor.

Intel Parallel Studio XE/Windows* OS

Do one of the following:

  • Run the <advisor-install-dir>\advixe-vars.bat command.

    The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

  • Run the <parallel-studio-install-dir>\psxevars.bat command.

    The default installation path, <parallel-studio-install-dir>, is below C:\Program Files (x86)\IntelSWTools\.

Intel® System Studio

Run the <advisor-install-dir>\advixe-vars.bat command to set up your environment. The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files ).

Command Line

This section shows how to get started using the Intel Advisor CLI. The main advantage of using the Intel Advisor CLI instead of the GUI is you can perform analysis and collect data as part of an automated or background task, and then view the result in a CLI report (or in the GUI) at your convenience.

Below are examples of typical Intel Advisor CLI tasks.

To Do This

Use This CLI Model

View a full list of command line options.

(Applies to Vectorization Advisor & Threading Advisor.)

advixe-cl -help

Note:

You can also check the Intel Advisor User Guide.

Run a Survey analysis.

(Applies to Vectorization Advisor & Threading Advisor.)

Linux* OS: advixe-cl -collect survey –project-dir ./myAdvisorProj -- myTargetApplication

Windows* OS: advixe-cl -collect survey -project-dir myAdvisorProj -- myTargetApplication

After running a Survey analysis, run a Trip Counts and FLOP analysis.

(Trip Counts applies to Vectorization Advisor & Threading Advisor, but FLOP is most useful in Vectorization Advisor.)

Linux* OS: advixe-cl -collect tripcounts -flop –project-dir ./myAdvisorProj -- myTargetApplication

Windows* OS: advixe-cl -collect tripcounts -flop -project-dir myAdvisorProj -- myTargetApplication

Tip:
  • Make sure you use the same project directory for both the Survey analysis and Trip Counts and FLOP analysis.

  • If you run the Trip Counts and FLOP analysis before running a Survey analysis, use the -no-auto-finalize option.

Run a Roofline analysis.

(Roofline applies to Vectorization Advisor & Threading Advisor, but is most useful in Vectorization Advisor.)

Linux* OS: advixe-cl -collect roofline –project-dir ./myAdvisorProj -- myTargetApplication

Windows* OS: advixe-cl -collect roofline -project-dir myAdvisorProj -- myTargetApplication

Tip:

Make sure you use the same project directory for both the Survey analysis and Trip Counts and FLOP analysis.

Print a Survey Report to identify loop IDs for Refinement analyses.

(Applies to Vectorization Advisor.)

Linux* OS: advixe-cl -report survey –project-dir ./myAdvisorProj

Windows* OS: advixe-cl -report survey -project-dir myAdvisorProj

Run a Refinement analysis.

(Applies to Vectorization Advisor.)

Linux* OS: advixe-cl -collect [dependencies | map] -mark-up-list=[loopID],[loopID] –project-dir ./myAdvisorProj -- myTargetApplication

Windows* OS: advixe-cl -collect [dependencies | map] -mark-up-list=[loopID],[loopID] -project-dir myAdvisorProj -- myTargetApplication

Run a Dependencies analysis.

(Applies to Threading Advisor.)

Linux* OS: advixe-cl -collect dependencies -project-dir ./myAdvisorProj  -- myTargetApplicaton

Windows* OS: advixe-cl -collect dependencies -project-dir myAdvisorProj -- myTargetApplicaton

Create a snapshot, put sources and binaries in it, and pack into an archive.

(Applies to Vectorization Advisor & Threading Advisor.)

Linux* OS: advixe-cl -snapshot -project-dir ./user/test/vec_project -pack -cache-sources -cache-binaries -- ./tmp/myAdvisorProj_snapshot

Windows* OS: advixe-cl -snapshot -project-dir /user/test/vec_project -pack -cache-sources -cache-binaries -- /tmp/myAdvisorProj_snapshot

Report a top-down functions list instead of a loop list.

(Applies to Vectorization Advisor & Threading Advisor.)

advixe-cl -report survey -top-down -display-callstack

Report all compiler opt-report and vec-report metrics.

(Applies to Vectorization Advisor.)

advixe-cl -report survey -show-all-columns

Report the top five self-time hotspots that were not vectorized because of a not inner loop msg id.

(Applies to Vectorization Advisor.)

advixe-cl -report survey -limit 5 -filter "Vectorization Message(s)"="loop was not vectorized: not inner loop"

Tip:

Click the appropriate Intel Advisor control: Get command line control in the Workflow to generate the corresponding collection command line.

MPI

This section shows how to get started using the Intel Advisor in an MPI environment.

You can perform an MPI analysis only through the Intel Advisor CLI; however, there are several ways to view an Intel Advisor result:

  • If you have an Intel Advisor GUI in your cluster environment, open a result in the GUI.

  • If you do not have an Intel Advisor GUI on your cluster node, copy the result directory to another machine with the Intel Advisor GUI and open the result there.

  • Use the Intel Advisor command line reports to browse results on a cluster node.

Use mpirun, mpiexec, or your preferred MPI batch job manager with the advixe-cl command to start an analysis. You may also use the -gtool option of mpirun. See the Intel® MPI Library Reference Manual (available in the Intel® Software Documentation Library) for more information.

Below are examples of typical Intel Advisor MPI tasks.

To Do This

Use This Command Line Model

Run 10 MPI ranks (processes), and start an Intel Advisor analysis on each rank.

Linux* OS: $ mpirun -n 10 advixe-cl -collect survey --project-dir ./my_proj ./your_app

Windows* OS: $ mpirun -n 10 advixe-cl -collect survey --project-dir my_proj your_app

Intel Advisor creates a number of result directories in the current directory, named as rank.0, rank.1, ... rank.n, where n is the MPI process rank.

Intel Advisor does not combine results from different ranks, so you must explore each rank result independently.

Run 10 MPI ranks, and start an Intel Advisor analysis only on rank #1.

Linux* OS: $ mpirun -n 10 advixe-cl -collect survey --project-dir ./my_proj ./your_app : -np 9 ./your_app

Windows* OS: $ mpirun -n 10 advixe-cl -collect survey --project-dir my_proj your_app :0

Run 10 MPI ranks, and build a Roofline chart on rank #1.

Linux* OS:

  1. $ mpirun -n 10 advixe-cl -collect survey --project-dir ./my_proj ./your_app :1 -np 9 ./your_app

  2. $ mpirun -n 10 advixe-cl -collect tripcounts -flop --project-dir ./my_proj ./your_app :1 -np 9 ./your_app

Windows* OS:

  1. $ mpirun -n 10 advixe-cl -collect survey --project-dir my_proj your_app :1

  2. $ mpirun -n 10 advixe-cl -collect survey -flop --project-dir my_proj your_app :1

Tip:

Make sure you use the same project directory for both the Survey analysis and Trip Counts and FLOPS analysis.

Next Steps

For More Information

Resource

Description

Online Resources

Explore Intel Documentation

Intel Advisor User Guide

Intel Advisor: Optimize Code for Modern Hardware

Vectorization Advisor Glossary

Vectorization Resources for Intel® Advisor Users

Intel Advisor Release Notes and New Features

Flow Graph Analyzer User Guide - Flow Graph Analyzer, which ships with the Intel Advisor, is a graphical tool for the construction, analysis, and visualization of applications that use Intel Threading Building Blocks (Intel TBB) flow graph interfaces.

Offline Resources

One of the key Vectorization Advisor features is GUI-embedded advice on how to fix vectorization issues specific to your code. To help you quickly locate information that augments that GUI-embedded advice, the Intel Advisor provides offline Intel compiler mini-guides. You can also find offline Recommendations and Compiler Diagnostic Details advice libraries in the same location as the mini-guides. Each issue and recommendation in these HTML files is collapsible/expandable.

Linux* OS: Available offline documentation is installed below <advisor-install-dir>/documentation/<locale>/. The default installation path, <advisor-install-dir>, is below:

  • /opt/intel/ for root users

  • $HOME/intel/ for non-root users

Windows* OS: Available offline documentation is installed below <advisor-install-dir>\documentation\<locale>\. The default installation path, <advisor-install-dir>, is below C:\Program Files (x86)\IntelSWTools\ (on certain systems, instead of Program Files (x86), the directory name is Program Files).

Note:

You may encounter the following known issues when using the following to view documentation:

  • Microsoft Windows Server* 2012 system: Trusted site prompt appears. Solution: Add about:internet to the list of trusted sites in the Tools > Internet Options > Security tab. You can remove after you finish viewing the documentation.

  • Microsoft Internet Explorer* 11 browser: Topics do not appear when you select them in the TOC pane. Solution: Add http://localhost to the list of trusted sites in the Tools > Internet Options > Security tab. You can remove after you finish viewing the documentation.

  • Microsoft Edge browser:

    • Context-sensitive (also known as F1) calls to a specific topic open the title page of the corresponding document instead. Solution: Use a different default browser.

    • Panes are truncated and a proper style sheet is not applied. Solution: Use a different default browser.