NAME
libpfm_itanium - support for Itanium specific PMU features
SYNOPSIS
#include <perfmon/pfmlib.h>
#include <perfmon/pfmlib_itanium.h>
int pfm_ita_is_ear(unsigned int i);
int pfm_ita_is_dear(unsigned int i);
int pfm_ita_is_dear_tlb(unsigned int i);
int pfm_ita_is_dear_cache(unsigned int i);
int pfm_ita_is_iear(unsigned int i);
int pfm_ita_is_iear_tlb(unsigned int i);
int pfm_ita_is_iear_cache(unsigned int i);
int pfm_ita_is_btb(unsigned int i);
int pfm_ita_support_opcm(unsigned int i);
int pfm_ita_support_iarr(unsigned int i);
int pfm_ita_support_darr(unsigned int i);
int pfm_ita_get_event_maxincr(unsigned int i, unsigned int *maxincr);
int pfm_ita_get_event_umask(unsigned int i, unsigned long *umask);
DESCRIPTION
The libpfm library provides full support for all the Itanium specific
features of the PMU. The interface is defined in pfmlib_itanium.h. It
consists of a set of functions and structures which describe and allow
access to the Itanium specific PMU features.
The Itanium specific functions presented here are mostly used to
retrieve the characteristics of an event. Given a opaque event
descriptor, obtained by pfm_find_event or its derivatives, they return
a boolean value indicating whether this event support this features or
is of a particular kind.
The pfm_ita_is_ear() function returns 1 if the event designated by i
corresponds to a EAR event, i.e., an Event Address Register type of
events. Otherwise 0 is returned. For instance, DATA_EAR_CACHE_LAT4 is
an ear event, but CPU_CYCLES is not. It can be a data or instruction
EAR event.
The pfm_ita_is_dear() function returns 1 if the event designated by i
corresponds to an Data EAR event. Otherwise 0 is returned. It can be a
cache or TLB EAR event.
The pfm_ita_is_dear_tlb() function returns 1 if the event designated by
i corresponds to a Data EAR TLB event. Otherwise 0 is returned.
The pfm_ita_is_dear_cache() function returns 1 if the event designated
by i corresponds to a Data EAR cache event. Otherwise 0 is returned.
The pfm_ita_is_iear() function returns 1 if the event designated by i
corresponds to an instruction EAR event. Otherwise 0 is returned. It
can be a cache or TLB instruction EAR event.
The pfm_ita_is_iear_tlb() function returns 1 if the event designated by
i corresponds to an instruction EAR TLB event. Otherwise 0 is returned.
The pfm_ita_is_iear_cache() function returns 1 if the event designated
by i corresponds to an instruction EAR cache event. Otherwise 0 is
returned.
The pfm_ita_support_opcm() function returns 1 if the event designated
by i supports opcode matching, i.e., can this event be measured
accurately when opcode matching via PMC8/PMC9 is active. Not all events
supports this feature.
The pfm_ita_support_iarr() function returns 1 if the event designated
by i supports code address range restrictions, i.e., can this event be
measured accurately when code range restriction is active. Otherwise 0
is returned. Not all events supports this feature.
The pfm_ita_support_darr() function returns 1 if the event designated
by i supports data address range restrictions, i.e., can this event be
measured accurately when data range restriction is active. Otherwise 0
is returned. Not all events supports this feature.
The pfm_ita_get_event_maxincr() function returns in maxincr the maximum
number of occurrences per cycle for the event designated by i. Certain
Itanium events can occur more than once per cycle. When an event occurs
more than once per cycle, the PMD counter will be incremented
accordingly. It is possible to restrict measurement when event occur
more than once per cycle. For instance, NOPS_RETIRED can happen up to 6
times/cycle which means that the threshold can be adjusted between 0
and 5, where 5 would mean that the PMD counter would be incremented by
1 only when the nop instruction is executed more than 5 times/cycle.
This function returns the maximum number of occurrences of the event
per cycle, and is the non-inclusive upper bound for the threshold to
program in the PMC register.
The pfm_ita_get_event_umask() function returns in umask the umask for
the event designated by i.
When the Itanium specific features are needed to support a measurement
their descriptions must be passed as model-specific input arguments to
the pfm_dispatch_events call. The Itanium specific input arguments are
described in the pfmlib_ita_input_param_t structure and the output
parameters in pfmlib_ita_output_param_t. They are defined as follows:
typedef enum {
PFMLIB_ITA_ISM_BOTH=0,
PFMLIB_ITA_ISM_IA32=1,
PFMLIB_ITA_ISM_IA64=2
} pfmlib_ita_ism_t;
typedef struct {
unsigned int flags;
unsigned int thres;
pfmlib_ita_ism_t ism;
} pfmlib_ita_counter_t;
typedef struct {
unsigned char opcm_used;
unsigned long pmc_val;
} pfmlib_ita_opcm_t;
typedef struct {
unsigned char btb_used;
unsigned char btb_tar;
unsigned char btb_tac;
unsigned char btb_bac;
unsigned char btb_tm;
unsigned char btb_ptm;
unsigned char btb_ppm;
unsigned int btb_plm;
} pfmlib_ita_btb_t;
typedef enum {
PFMLIB_ITA_EAR_CACHE_MODE= 0,
PFMLIB_ITA_EAR_TLB_MODE = 1,
} pfmlib_ita_ear_mode_t;
typedef struct {
unsigned char ear_used;
pfmlib_ita_ear_mode_t ear_mode;
pfmlib_ita_ism_t ear_ism;
unsigned int ear_plm;
unsigned long ear_umask;
} pfmlib_ita_ear_t;
typedef struct {
unsigned int rr_plm;
unsigned long rr_start;
unsigned long rr_end;
} pfmlib_ita_input_rr_desc_t;
typedef struct {
unsigned long rr_soff;
unsigned long rr_eoff;
} pfmlib_ita_output_rr_desc_t;
typedef struct {
unsigned int rr_flags;
pfmlib_ita_input_rr_desc_t rr_limits[4];
unsigned char rr_used;
} pfmlib_ita_input_rr_t;
typedef struct {
unsigned int rr_nbr_used;
pfmlib_ita_output_rr_desc_t rr_infos[4];
pfmlib_reg_t rr_br[8];
} pfmlib_ita_output_rr_t;
typedef struct {
pfmlib_ita_counter_t pfp_ita_counters[PMU_ITA_NUM_COUNTERS];
unsigned long pfp_ita_flags;
pfmlib_ita_opcm_t pfp_ita_pmc8;
pfmlib_ita_opcm_t pfp_ita_pmc9;
pfmlib_ita_ear_t pfp_ita_iear;
pfmlib_ita_ear_t pfp_ita_dear;
pfmlib_ita_btb_t pfp_ita_btb;
pfmlib_ita_input_rr_t pfp_ita_drange;
pfmlib_ita_input_rr_t pfp_ita_irange;
} pfmlib_ita_input_param_t;
typedef struct {
pfmlib_ita_output_rr_t pfp_ita_drange;
pfmlib_ita_output_rr_t pfp_ita_irange;
} pfmlib_ita_output_param_t;
INSTRUCTION SET
The Itanium processor provides two additional per-event features for
counters: thresholding and instruction set selection. They can be set
using the pfp_ita_counters data structure for each event. The ism
field can be initialized as follows:
PFMLIB_ITA_ISM_BOTH
The event will be monitored during IA-64 and IA-32 execution
PFMLIB_ITA_ISM_IA32
The event will only be monitored during IA-32 execution
PFMLIB_ITA_ISM_IA64
The event will only be monitored during IA-64 execution
If ism has a value of zero, it will default to PFMLIB_ITA_ISM_BOTH.
The thres indicates the threshold for the event. A threshold of n means
that the counter will be incremented by one only when the event occurs
more than n times per cycle.
The flags field contains event-specific flags. The currently defined
flags are:
PFMLIB_ITA_FL_EVT_NO_QUALCHECK
When this flag is set it indicates that the library should
ignore the qualifiers constraints for this event. Qualifiers
includes opcode matching, code and data range restrictions. When
an event is marked as not supporting a particular qualifier, it
usually means that it is ignored, i.e., the extra level of
filtering is ignored. For instance, the CPU_CYCLES events does
not support code range restrictions and by default the library
will refuse to program it if range restriction is also
requested. Using the flag will override the check and the call
to pfm_dispatch_events will succeed. In this case, CPU_CYCLES
will be measured for the entire program and not just for the
code range requested. For certain measurements this is
perfectly acceptable as the range restriction will only be
applied relevant to events which support it. Make sure you
understand which events do not support certains qualifiers
before using this flag.
OPCODE MATCHING
The pfp_ita_pmc8 and pfp_ita_pmc9 fields of type pfmlib_ita_opcm_t
contain the description of what to do with the opcode matchers. Itanium
supports opcode matching via PMC8 and PMC9. When this feature is used
the opcm_used field must be set to 1, otherwise it is ignored by the
library. The pmc_val simply contains the raw value to store in PMC8 or
PMC9. The library does not modify the values for PMC8 and PMC9, they
will be stored in the pfp_pmcs table of the generic output parameters.
EVENT ADDRESS REGISTERS
The pfp_ita_iear field of type pfmlib_ita_ear_t describes what to do
with instruction Event Address Registers (I-EARs). Again if this
feature is used the ear_used must be set to 1, otherwise it will be
ignored by the library. The ear_mode must be set to either one of
PFMLIB_ITA_EAR_TLB_MODE, PFMLIB_ITA_EAR_CACHE_MODEto indicate the type
of EAR to program. The umask to store into PMC10 must be in ear_umask.
The privilege level mask at which the I-EAR will be monitored must be
set in ear_plm which can be any combination of PFM_PLM0, PFM_PLM1,
PFM_PLM2, PFM_PLM3. If ear_plm is 0 then the default privilege level
mask in pfp_dfl_plm is used. Finally the instruction set for which to
monitor is in ear_ism and can be any one of PFMLIB_ITA_ISM_BOTH,
PFMLIB_ITA_ISM_IA32, or PFMLIB_ITA_ISM_IA64.
The pfp_ita_dear field of type pfmlib_ita_ear_t describes what to do
with data Event Address Registers (D-EARs). The description is
identical to the I-EARs except that it applies to PMC11.
In general, there are four different methods to program the EAR (data
or instruction):
Method 1
There is an EAR event in the list of events to monitor and
ear_used is cleared. In this case the EAR will be programmed
(PMC10 or PMC11) based on the information encoded in the event.
A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
count DATA_EAR_EVENT or INSTRUCTION_EAR_EVENTS depending on the
type of EAR.
Method 2
There is an EAR event in the list of events to monitor and
ear_used is set. In this case the EAR will be programmed (PMC10
or PMC11) using the information in the pfp_ita_iear or
pfp_ita_dear structure because it contains more detailed
information, such as privilege level and instruction set. A
counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
count DATA_EAR_EVENT or INSTRUCTION_EAR_EVENTS depending on the
type of EAR.
Method 3
There is no EAR event in the list of events to monitor and and
ear_used is cleared. In this case no EAR is programmed.
Method 4
There is no EAR event in the list of events to monitor and and
ear_used is set. In this case case the EAR will be programmed
(PMC10 or PMC11) using the information in the pfp_ita_iear or
pfp_ita_dear structure. This is the free running mode for the
EAR.
BRANCH TRACE BUFFER
The pfp_ita_btb of type pfmlib_ita_btb_t field is used to configure the
Branch Trace Buffer (BTB). If the btb_used is set, then the library
will take the configuration into account, otherwise any BTB
configuration will be ignored. The various fields in this structure
provide means to filter out the kind of branches that gets recorded in
the BTB. Each one represents an element of the branch architecture of
the Itanium processor. Refer to the Itanium specific documentation for
more details on the branch architecture. The fields are as follows:
btb_tar
If the value of this field is 1, then branches predicted by the
Target Address Register (TAR) predictions are captured. If 0 no
branch predicted by the TAR is included.
btb_tac
If this field is 1, then branches predicted by the Target
Address Cache (TAC) are captured. If 0 no branch predicted by
the TAC is included.
btb_bac
If this field is 1, then branches predicted by the Branch
Address Corrector (BAC) are captured. If 0 no branch predicted
by the BAC is included.
btb_tm If this field is 0, then no branch is captured. If this field is
1, then non taken branches are captured. If this field is 2,
then taken branches are captured. Finally if this field is 3
then all branches are captured.
btb_ptm
If this field is 0, then no branch is captured. If this field is
1, then branches with a mispredicted target address are
captured. If this field is 2, then branches with correctly
predicted target address are captured. Finally if this field is
3 then all branches are captured regardless of target address
prediction.
btb_ppm
If this field is 0, then no branch is captured. If this field is
1, then branches with a mispredicted path (taken/non taken) are
captured. If this field is 2, then branches with correctly
predicted path are captured. Finally if this field is 3 then all
branches are captured regardless of their path prediction.
btb_plm
This is the privilege level mask at which the BTB captures
branches. It can be any combination of PFM_PLM0, PFM_PLM1,
PFM_PLM2, PFM_PLM3. If btb_plm is 0 then the default privilege
level mask in pfp_dfl_plm is used.
There are 4 methods to program the BTB and they are as follows:
Method 1
The BRANCH_EVENT is in the list of events to monitor and
btb_used is cleared. In this case, the BTB will be configured
(PMC12) to record ALL branches. A counting monitor
(PMC4/PMD4-PMC7/PMD7) will be programmed to count BRANCH_EVENT.
Method 2
The BRANCH_EVENT is in the list of events to monitor and
btb_used is set. In this case, the BTB will be configured
(PMC12) using the information in the pfp_ita_btb structure. A
counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
count BRANCH_EVENT.
Method 3
The BRANCH_EVENT is not in the list of events to monitor and
btb_used is set. In this case, the BTB will be configured
(PMC12) using the information in the pfp_ita_btb structure. This
is the free running mode for the BTB.
Method 4
The BRANCH_EVENT is not in the list of events to monitor and
btb_used is cleared. In this case, the BTB is not programmed.
DATA AND CODE RANGE RESTRICTIONS
The pfp_ita_drange and pfp_ita_irange fields control the range
restrictions for the data and code respectively. The idea is that the
application passes a set of ranges, each designated by a start and end
address. Upon return from pfm_dispatch_events(), the application gets
back the set of registers and their values that needs to be programmed
via a kernel interface.
Range restriction is implemented using the debug registers. There is a
limited number of debug registers and they go in pair. With 8 data
debug registers, a maximum of 4 distinct ranges can be specified. The
same applies to code range restrictions. Moreover, there are some
severe constraints on the alignment and size of the range. Given that
the size range is specified using a bitmask, there can be situations
where the actual range is larger than the requested range. The library
will make the best effort to cover only what is requested. It will
never cover less than what is requested. The algorithm uses more than
one pair of debug registers to get a more precise range if necessary.
Hence, up to the 4 pairs can be used to describe a single range. The
library returns the start and end offsets of the actual range compared
to the requested range.
If range restriction is to be used, the rr_used field must be set to
one, otherwise settings will be ignored. The ranges are described by
the pfmlib_ita2_input_rr_t structure. Up to 4 ranges can be defined.
Each range is described in by a entry in rr_limits.
The pfmlib_ita2_input_rr_desc_t structure is defined as follows:
rr_plm The privilege level at which the range is active. It can be any
combinations of PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If
rr_plm is 0 then the default privilege level mask in pfp_dfl_plm
is used.The privilege level is only relevant for code ranges,
data ranges ingores the setting.
rr_start
This is the start address of the range. Any address is supported
but for code range it must be bundle aligned, i.e., 16-byte
aligned.
rr_end This is the end address of the range. Any address is supported
but for code range it must be bundle aligned, i.e., 16-byte
aligned.
The library will provide the values for the debug registers as well as
some information about the actual ranges in the output parameters and
more precisely in the pfmlib_ita2_output_rr_t structure for each range.
The structure is defined as follows:
rr_nbr_used
Contains the number of debug registers used to cover the range.
This is necessarily an even number as debug registers always go
in pair. The value of this field is between 0 and 7.
rr_br This table contains the list of debug registers necessary to
cover the ranges. Each element is of type pfmlib_reg_t. The
reg_num field contains the debug register index while reg_value
contains the debug register value. Both the index and value must
be copied into the kernel specific argument to program the debug
registers. The library never programs them.
rr_infos
Contains information about the ranges defined. Because of
alignment restrictions, the actual range covered by the debug
registers may be larger than the requested range. This table
describe the differences between the requested and actual ranges
expressed as offsets:
rr_soff
Contains the start offset of the actual range described by the
debug registers. If zero, it means the library was able to match
exactly the beginning of the range. Otherwise it represents the
number of byte by which the actual range preceeds the requested
range.
rr_eoff
Contains the end offset of the actual range described by the
debug registers. If zero, it means the library was able to match
exactly the end of the range. Otherwise it represents the number
of bytes by which the actual range exceeds the requested range.
ERRORS
Refer to the description of pfm_dispatch_events() for errors when using
the Itanium specific input and output arguments.
SEE ALSO
pfm_dispatch_events(3) and set of examples shipped with the library
AUTHOR
Stephane Eranian <eranian@hpl.hp.com>
November, 2003 LIBPFM(3)