Intel® VTune™ Amplifier can be installed on Windows*, macOS*, and Linux* platforms and used for analysis of local and remote target systems. Use this tool to analyze the algorithm choices, find serial and parallel code bottlenecks, understand where and how your application can benefit from available hardware resources, and speed up the execution. The Find Your Analysis guide, available from the VTune Amplifier Welcome page, is a great place to discover the best analysis type to run for your use case.

Before running a detailed analysis with Intel VTune Amplifier, consider using one of the following Intel Performance Snapshot tools as quick ways to discover untapped performance and guide you to the best analysis to run next. The tools are available bundled with Intel Parallel Studio or as free downloads from the Intel Developer Zone website.

Note:

Starting with Intel VTune Amplifier 2018 version, product help, tutorials, and Release Notes are available online only from the Intel Software Documentation Library in the Intel Developer Zone (IDZ). You can also download an offline version of the product help either from IDZ or from the Intel® Software Development Products Registration Center.

Prerequisites

Step 1: Start the VTune Amplifier

Select the appropriate product or use the instructions for the standalone version of VTune Amplifier to set appropriate environment variables and launch the tool.

Standalone VTune Amplifier, Installed with Intel Parallel Studio, or Installed with Intel Media Studio

  1. Set up the environment variables:

    • csh/tcsh users: source <install_dir>/amplxe-vars.csh

    • bash users: source <install_dir>/amplxe-vars.sh

    By default, the <install_dir> is:

    • For root users: /opt/intel/vtune_amplifier_2018

    • For non-root users: $HOME/intel/vtune_amplifier_2018

  2. Launch the VTune Amplifier:

    • For standalone GUI interface, run the amplxe-gui command.

    • For command line interface, run the amplxe-cl command.

Installed with Intel System Studio

From within the Intel System Studio IDE, select Intel System Studio > VTune Amplifier > Launch VTune Amplifier. Launching from within Intel System Studio sets all appropriate environment variables before opening the tool.

Step 2: Set Up the Analysis Target

  1. Build your target application in the Release mode with all optimizations enabled.

  2. Create a VTune Amplifier project:

    1. Click the menu button in the right corner and go to New > Project... .

    2. Specify the project name and location in the Create Project dialog box.

    3. Click Create Project.

  3. In the Analysis Target tab, select a target system and an analysis target type.

  4. Configure your target: application location, parameters, and search directories (if required).

Step 3: Configure Analysis

  1. Switch to the Analysis Type tab.

  2. From the left pane, select an analysis type applicable to your platform and configure analysis options in the right pane.

    Tip:

    Use the Find Your Analysis guide, available from the VTune Amplifier Welcome screen, for help picking your starting point for analysis based on your use case.

  3. Click the Start button on the right to launch the analysis.

Step 4: View and Analyze Performance Data

When data collection completes, the VTune Amplifier opens the result in the default viewpoint, which is a preset configuration of windows for an analysis result. You may switch between different viewpoint to analyze the data from different perspectives using different sets of performance metrics.

Start your analysis with the Summary window to get an overview of the application performance and then switch to other windows to explore the performance deeper at the granularity of function, source line and so on.

Key Features

ALGORITHM ANALYSIS

  • Run Basic Hotspots analysis type to understand application flow and identify sections of code that get a lot of execution time (hotspots).

  • Use the algorithm Advanced Hotspots analysis to extend Basic Hotspots analysis by collecting call stacks and analyze CPI (Cycles Per Instructions) metric. NEW: You can also use this analysis type to profile native or Java* applications running in a Docker* or Mesos* container on a Linux system.

  • NEW: Use Memory Consumption analysis for your native Linux* or Python* targets to explore RAM usage over time and identify memory objects allocated and released during the analysis run.

  • Run Concurrency analysis to estimate parallelization in your code and understand how effectively your application uses available cores.

  • Run Locks and Waits analysis to identify synchronization objects preventing effective utilization of processor resources.

MICROARCHITECTURE ANALYSIS

  • Run General Exploration analysis to triage hardware issues in your application. This type collects a complete list of events for analyzing a typical client application.

  • Use Memory Access analysis to identify memory-related issues, like NUMA problems and bandwidth-limited accesses, and attribute performance events to memory objects (data structures), which is provided due to instrumentation of memory allocations/de-allocations and getting static/global variables from symbol information.

PLATFORM ANALYSIS

  • Run System Overview analysis to review general behavior of a target Linux* or Android* system and correlate power and performance metrics with the interrupt request (IRQ).

  • Run CPU/GPU Concurrency analysis to identify code regions where your application is CPU or GPU bound.

  • Use GPU Hotspots analysis to identify GPU tasks with high GPU utilization and estimate the effectiveness of this utilization.

  • For GPU-bound applications running on Intel HD Graphics, collect GPU hardware events to estimate how effectively the Processor Graphics are used.

  • Collect data on Ftrace* events on Android and Linux targets and Atrace* events on Android targets.

  • Analyze hot Intel® Media SDK programs and OpenCL™ kernels running on a GPU. For OpenCL application analysis, use the Architecture Diagram to explore GPU hardware metrics per GPU architecture blocks.

  • Run Disk Input and Output analysis to monitor utilization of the disk subsystem, CPU and processor buses. This analysis type provides a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.

GPU Analysis

COMPUTE-INTENSIVE APPLICATION ANALYSIS

  • Run HPC Performance Characterization analysis to identify how effectively your high-performance computing application uses CPU, memory, and floating-point operation hardware resources. This analysis type provides additional scalability metrics for applications that use OpenMP or Intel MPI runtime libraries.

  • Run an Algorithm analysis type with the Analyze OpenMP regions option enabled to collect OpenMP or MPI data for applications using OpenMP or MPI runtime libraries. Note that HPC Performance Characterization analysis has the option enabled by default.

  • For OpenMP applications, analyze the collected performance data to identify inefficiencies in parallelization. Review the Potential Gain metric values per OpenMP region to understand the maximum time that could be saved if the OpenMP region is optimized to have no load imbalance assuming no runtime overhead.

  • For hybrid OpenMP and MPI applications, explore OpenMP efficiency metrics by MPI processes laying on the critical path.

SOURCE ANALYSIS

  • Double click a hotspot function to drill down to the source code and analyze performance per source line or assembler instruction. By default, the hottest line is highlighted.

  • For help on an assembly instruction, right-click the instruction in the Assembly pane and select Instruction Reference from the context menu.

MANAGED CODE ANALYSIS

Configure target options for managed code analysis in the native, managed, or mixed mode:

  • Event-based sampling (EBS) or user-mode sampling and tracing analysis for Java* applications running in the Launch Application or Attach mode;

  • Basic Hotspots and Locks and Waits analysis for Python* applications running in the Launch Application and Attach to Process modes.

CUSTOM ANALYSIS

  • Create a copy of a current analysis type and modify the collection options to create your own analysis configurations.

  • Run your own custom collector from the VTune Amplifier to get the aggregated performance data, from your custom collection and VTune Amplifier analysis, in the same result.

  • Import performance data collected by your own or third-party collector into the VTune Amplifier result collected in parallel with your external collection. Use the Import from CSV button to integrate the external data to the result.

  • Collect data from a remote virtual machine by configuring KVM guest OS profiling, which makes use of the Linux Perf KVM feature. Select Analyze KVM guest OS from the Advanced options on your Linux host system.

Training and Documentation

Document

Description

Online Training

The online training site is an excellent resource for learning VTune Amplifier basics with Getting Started guides, videos, tutorials, webinars and technical articles.

Intel VTune Amplifier Tutorials

Tutorials show you how to use basic VTune Amplifier features. VTune Amplifier tutorials guide a new user through basic walkthrough operations with a short sample. The tutorials provide an excellent foundation before you read the VTune Amplifier help.

Sample code is typically installed to <install-dir>/samples/<locale>/<programming_language>.

VTune Amplifier sample code and corresponding tutorials are also available at https://software.intel.com/en-us/product-code-samples

Intel VTune Amplifier Cookbook

Performance analysis cookbook that contains recipes identifying and solving the most popular performance problems with the help of VTune Amplifier's analysis types.

Release Notes

The Release Notes document contains the most up-to-date information about the product, including a product description, technical support, and known limitations and issues.

This document also contains system requirements for installing the product. Before installation, the Release Notes document is located at the root level (same level as the installation script/executable) of the installation download package.

Installation Guide

The Installation Guide contains basic installation instructions for VTune Amplifier and post-installation configuration instructions for the various drivers and collectors.

The latest Installation Guide can be found on the Intel® Developer Zone (Intel® DZ) website.

Intel VTune Amplifier Help

The help is the primary documentation for the VTune Amplifier.

Note:

You can also download an offline version of all product documentation for Intel® Parallel Studio XE or Intel® System Studio.

SEP User's Guide

This document provides instructions on using the VTune Amplifier sampling collector (SEP) targeted for hardware event-based sampling analysis on resource-restricted systems.

EMON User's Guide

This document provides instructions on using the EMON command line tool, which is used to specify performance events, allocate resources, and retrieve event counts in processors and chipsets.

Intel Processor Event Reference

This help provides reference information for Intel processor events used by the VTune Amplifier for hardware event-based sampling analysis. Most of this information is drawn from Intel processor information sources on the web.

Command Line Help

You can access general help for VTune Amplifier command line interface by entering the following command line:

  • amplxe-cl -help for help on basic action options

  • amplxe-cl -help <action-option> for help on a particular action option and its knobs

Web Resources

Legal Information

Intel, the Intel logo, and VTune are trademarks of Intel Corporation in the U.S. and/or other countries.

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission from Khronos.

Copyright 2015-2018 Intel Corporation

This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.

This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.