NAME
libpfm_montecito - support for Itanium 2 9000 (Montecito) processor
specific PMU features
SYNOPSIS
#include <perfmon/pfmlib.h>
#include <perfmon/pfmlib_montecito.h>
int pfm_mont_is_ear(unsigned int i);
int pfm_mont_is_dear(unsigned int i);
int pfm_mont_is_dear_tlb(unsigned int i);
int pfm_mont_is_dear_cache(unsigned int i);
int pfm_mont_is_dear_alat(unsigned int i);
int pfm_mont_is_iear(unsigned int i);
int pfm_mont_is_iear_tlb(unsigned int i);
int pfm_mont_is_iear_cache(unsigned int i);
int pfm_mont_is_etb(unsigned int i);
int pfm_mont_support_opcm(unsigned int i);
int pfm_mont_support_iarr(unsigned int i);
int pfm_mont_support_darr(unsigned int i);
int pfm_mont_get_event_maxincr(unsigned int i, unsigned int *maxincr);
int pfm_mont_get_event_umask(unsigned int i, unsigned long *umask);
int pfm_mont_get_event_group(unsigned int i, int *grp);
int pfm_mont_get_event_set(unsigned int i, int *set);
int pfm_mont_get_event_type(unsigned int i, int *type);
int pfm_mont_get_ear_mode(unsigned int i, pfmlib_mont_ear_mode_t *mode);
int pfm_mont_irange_is_fine(pfmlib_output_param_t *outp, pfmlib_mont_output_param_t *mod_out);
DESCRIPTION
The libpfm library provides full support for all the Itanium 2 900
(Montecito) processor specific features of the PMU. The interface is
defined in pfmlib_montecito.h. It consists of a set of functions and
structures which describe and allow access to the model specific PMU
features.
The Itanium 2 900 (Montecito) processor specific functions presented
here are mostly used to retrieve the characteristics of an event. Given
a opaque event descriptor, obtained by pfm_find_event or its
derivatives, they return a boolean value indicating whether this event
support this feature or is of a particular kind.
The pfm_mont_is_ear() function returns 1 if the event designated by i
corresponds to a EAR event, i.e., an Event Address Register type of
events. Otherwise 0 is returned. For instance, DATA_EAR_CACHE_LAT4 is
an ear event, but CPU_OP_CYCLES_ALL is not. It can be a data or
instruction EAR event.
The pfm_mont_is_dear() function returns 1 if the event designated by i
corresponds to an Data EAR event. Otherwise 0 is returned. It can be a
cache or TLB EAR event.
The pfm_mont_is_dear_tlb() function returns 1 if the event designated
by i corresponds to a Data EAR TLB event. Otherwise 0 is returned.
The pfm_mont_is_dear_cache() function returns 1 if the event designated
by i corresponds to a Data EAR cache event. Otherwise 0 is returned.
The pfm_mont_is_dear_alat() function returns 1 if the event designated
by i corresponds to a ALAT EAR cache event. Otherwise 0 is returned.
The pfm_mont_is_iear() function returns 1 if the event designated by i
corresponds to an instruction EAR event. Otherwise 0 is returned. It
can be a cache or TLB instruction EAR event.
The pfm_mont_is_iear_tlb() function returns 1 if the event designated
by i corresponds to an instruction EAR TLB event. Otherwise 0 is
returned.
The pfm_mont_is_iear_cache() function returns 1 if the event designated
by i corresponds to an instruction EAR cache event. Otherwise 0 is
returned.
The pfm_mont_support_opcm() function returns 1 if the event designated
by i supports opcode matching, i.e., can this event be measured
accurately when opcode matching via PMC32/PMC34 is active. Not all
events supports this feature.
The pfm_mont_support_iarr() function returns 1 if the event designated
by i supports code address range restrictions, i.e., can this event be
measured accurately when code range restriction is active. Otherwise 0
is returned. Not all events supports this feature.
The pfm_mont_support_darr() function returns 1 if the event designated
by i supports data address range restrictions, i.e., can this event be
measured accurately when data range restriction is active. Otherwise 0
is returned. Not all events supports this feature.
The pfm_mont_get_event_maxincr() function returns in maxincr the
maximum number of occurrences per cycle for the event designated by i.
Certain Itanium 2 9000 (Montecito) events can occur more than once per
cycle. When an event occurs more than once per cycle, the PMD counter
will be incremented accordingly. It is possible to restrict
measurement when event occur more than once per cycle. For instance,
NOPS_RETIRED can happen up to 6 times/cycle which means that the
threshold can be adjusted between 0 and 5, where 5 would mean that the
PMD counter would be incremented by 1 only when the nop instruction is
executed more than 5 times/cycle. This function returns the maximum
number of occurrences of the event per cycle, and is the non-inclusive
upper bound for the threshold to program in the PMC register.
The pfm_mont_get_event_umask() function returns in umask the umask for
the event designated by i.
The pfm_mont_get_event_grp() function returns in grp the group to which
the event designated by i belongs. The notion of group is used for L1D
and L2D cache events only. For all other events, a group is irrelevant
and can be ignored. If the event is an L2D cache event then the value
of grp will be PFMLIB_MONT_EVT_L2D_CACHE_GRP. Similarly, if the event
is an L1D cache event, the value of grp will be
PFMLIB_MONT_EVT_L1D_CACHE_GRP. In any other cases, the value of grp
will be PFMLIB_MONT_EVT_NO_GRP.
The pfm_mont_get_event_set() function returns in set the set to which
the event designated by i belongs. A set is a subdivision of a group
and is therefore only relevant for L1 and L2 cache events. An event can
only belong to one group and one set. This partioning of the cache
events is due to some hardware limitations which impose some
restrictions on events. For a given group, events from different sets
cannot be measured at the same time. If the event does not belong to a
group then the value of set is PFMLIB_MONT_EVT_NO_SET.
The pfm_mont_get_event_type() function returns in type the type of the
event designated by i belongs. The itanium2 9000 (Montecito) events
can have any one of the following types:
PFMLIB_MONT_EVT_ACTIVE
The event can only occur when the processor thread that
generated it is currently active
PFMLIB_MONT_EVT_FLOATING
The event can be generated when the processor thread is inactive
PFMLIB_MONT_EVT_CAUSAL
The event does not belong to a processor thread
PFMLIB_MONT_EVT_SELF_FLOATING
Hybrid event. It is floating if measured with .me. If is causal
otherwise.
The pfm_mont_irange_is_fine function returns 1 if the configuration
description passed in outp, the generic output parameters and mod_out,
the Itanium 2 9000 (Montecito) specific output parameters, use code
range restriction in fine mode. Otherwise the function returns 0. This
function can only be called after a call pfm_dispatch_events() which
returned successfully and had the data structures pointed to by outp
and mod_out as output parameters.
The pfm_mont_get_event_ear_mode() function returns in mode the EAR mode
of the event designated by i. If the event is not an EAR event, then
PFMLIB_ERR_INVAL is returned and mode is not updated. Otherwise mode
can have the following values:
PFMLIB_MONT_EAR_TLB_MODE
The event is an EAR TLB mode. It can be either data or
instruction TLB EAR.
PFMLIB_MONT_EAR_CACHE_MODE
The event is a cache EAR. It can be either data or instruction
cache EAR.
PFMLIB_MONT_EAR_ALAT_MODE
The event is an ALAT EAR. It can only be a data EAR event.
When the Itanium 2 9000 (Montecito) specific features are needed to
support a measurement their descriptions must be passed as model-
specific input arguments to the pfm_dispatch_events call. The Itanium
2 9000 (Montecito) specific input arguments are described in the
pfmlib_mont_input_param_t structure and the output parameters in
pfmlib_mont_output_param_t. They are defined as follows:
typedef struct {
unsigned int flags;
unsigned int thres;
} pfmlib_mont_counter_t;
typedef struct {
unsigned char opcm_used;
unsigned char opcm_m;
unsigned char opcm_i;
unsigned char opcm_f;
unsigned char opcm_b;
unsigned long opcm_match;
unsigned long opcm_mask;
} pfmlib_mont_opcm_t;
typedef struct {
unsigned char etb_used;
unsigned int etb_plm;
unsigned char etb_ds;
unsigned char etb_tm;
unsigned char etb_ptm;
unsigned char etb_ppm;
unsigned char etb_brt;
} pfmlib_mont_etb_t;
typedef struct {
unsigned char ipear_used;
unsigned int ipear_plm;
unsigned short ipear_delay;
} pfmlib_mont_ipear_t;
typedef enum {
PFMLIB_MONT_EAR_CACHE_MODE= 0,
PFMLIB_MONT_EAR_TLB_MODE = 1,
PFMLIB_MONT_EAR_ALAT_MODE = 2
} pfmlib_mont_ear_mode_t;
typedef struct {
unsigned char ear_used;
pfmlib_mont_ear_mode_t ear_mode;
unsigned int ear_plm;
unsigned long ear_umask;
} pfmlib_mont_ear_t;
typedef struct {
unsigned int rr_plm;
unsigned long rr_start;
unsigned long rr_end;
} pfmlib_mont_input_rr_desc_t;
typedef struct {
unsigned long rr_soff;
unsigned long rr_eoff;
} pfmlib_mont_output_rr_desc_t;
typedef struct {
unsigned int rr_flags;
pfmlib_mont_input_rr_desc_t rr_limits[4];
unsigned char rr_used;
} pfmlib_mont_input_rr_t;
typedef struct {
unsigned int rr_nbr_used;
pfmlib_mont_output_rr_desc_t rr_infos[4];
pfmlib_reg_t rr_br[8];
} pfmlib_mont_output_rr_t;
typedef struct {
pfmlib_mont_counter_t pfp_mont_counters[PMU_MONT_NUM_COUNTERS];
unsigned long pfp_mont_flags;
pfmlib_mont_opcm_t pfp_mont_opcm1;
pfmlib_mont_opcm_t pfp_mont_opcm2;
pfmlib_mont_ear_t pfp_mont_iear;
pfmlib_mont_ear_t pfp_mont_dear;
pfmlib_mont_ipear_t pfp_mont_ipear;
pfmlib_mont_etb_t pfp_mont_etb;
pfmlib_mont_input_rr_t pfp_mont_drange;
pfmlib_mont_input_rr_t pfp_mont_irange;
} pfmlib_mont_input_param_t;
typedef struct {
pfmlib_mont_output_rr_t pfp_mont_drange;
pfmlib_mont_output_rr_t pfp_mont_irange;
} pfmlib_mont_output_param_t;
PER-EVENT OPTIONS
The Itanium 2 9000 (Montecito) processor provides one per-event feature
for counters: thresholding. It can be set using the pfp_mont_counters
data structure for each event.
The thres indicates the threshold for the event. A threshold of n means
that the counter will be incremented by one only when the event occurs
more than n times per cycle.
The flags field contains event-specific flags. The currently defined
flags are:
PFMLIB_MONT_FL_EVT_NO_QUALCHECK
When this flag is set it indicates that the library should
ignore the qualifiers constraints for this event. Qualifiers
includes opcode matching, code and data range restrictions. When
an event is marked as not supporting a particular qualifier, it
usually means that it is ignored, i.e., the extra level of
filtering is ignored. For instance, the FE_BUBBLE_ALL event does
not support code range restrictions and by default the library
will refuse to program it if range restriction is also
requested. Using the flag will override the check and the call
to pfm_dispatch_events will succeed. In this case,
FE_BUBBLE_ALL will be measured for the entire program and not
just for the code range requested. For certain measurements
this is perfectly acceptable as the range restriction will only
be applied relevant to events which support it. Make sure you
understand which events do not support certains qualifiers
before using this flag.
OPCODE MATCHING
The pfp_mont_opcm1 and pfp_mont_opcm2 fields of type pfmlib_mont_opcm_t
contain the description of what to do with the opcode matchers. The
Itanium 2 9000 (Montecito) processor supports opcode matching via PMC32
and PMC34. When this feature is used the opcm_used field must be set to
1, otherwise it is ignored by the library. The Itanium 2 9000
(Montecito) processor implements two full 41-bit opcode matchers. As
such, it is possible to match all instructions individually. It is
possible to match a single instruction or an instruction pattern based
on opcode or slot type. The slots are specified in:
opcm_m Match when the instuction is in a M-slot (memory)
opcm_i Match when the instruction is in an I-slot (ALU)
opcm_f Match when the instruction is in an F-slot (FPU)
opcm_b Match when the instruction is in a B-slot (Branch)
Any combinations of slot settings is supported. To match all slot
types, simply set all fields to 1.
The 41-bit opcode is specified in opcm_match and a 41-bit mask is
passed in opcm_mask. When a bit is set in opcm_mask the corresponding
bit is ignored in opcm_match.
EVENT ADDRESS REGISTERS
The pfp_mont_iear field of type pfmlib_mont_ear_t describes what to do
with instruction Event Address Registers (I-EARs). Again if this
feature is used the ear_used must be set to 1, otherwise it will be
ignored by the library. The ear_mode must be set to either one of
PFMLIB_MONT_EAR_TLB_MODE, PFMLIB_MONT_EAR_CACHE_MODEto indicate the
type of EAR to program. The umask to store into PMC10 must be in
ear_umask. The privilege level mask at which the I-EAR will be
monitored must be set in ear_plm which can be any combination of
PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If ear_plm is 0 then the
default privilege level mask in pfp_dfl_plm is used.
The pfp_mont_dear field of type pfmlib_mont_ear_t describes what to do
with data Event Address Registers (D-EARs). The description is
identical to the I-EARs except that it applies to PMC11 and that a
ear_mode of PFMLIB_MONT_EAR_ALAT_MODE is possible.
In general, there are four different methods to program the EAR (data
or instruction):
Method 1
There is an EAR event in the list of events to monitor and
ear_used is cleared. In this case the EAR will be programmed
(PMC10 or PMC11) based on the information encoded in the event.
A counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
count DATA_EAR_EVENT or L1I_EAR_EVENTS depending on the type of
EAR.
Method 2
There is an EAR event in the list of events to monitor and
ear_used is set. In this case the EAR will be programmed (PMC10
or PMC11) using the information in the pfp_ita_iear or
pfp_ita_dear structure because it contains more detailed
information, such as privilege level and isntruction set. A
counting monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to
count DATA_EAR_EVENT or L1I_EAR_EVENTS depending on the type of
EAR.
Method 3
There is no EAR event in the list of events to monitor and and
ear_used is cleared. In this case no EAR is programmed.
Method 4
There is no EAR event in the list of events to monitor and and
ear_used is set. In this case case the EAR will be programmed
(PMC10 or PMC11) using the information in the pfp_mont_iear or
pfp_mont_dear structure. This is the free running mode for the
EAR.
EXECUTION TRACE BUFFER
The pfp_mont_etb of type pfmlib_mont_etb_t field is used to configure
the Execution Trace Buffer (ETB). If the etb_used is set, then the
library will take the configuration into account, otherwise any ETB
configuration will be ignored. The various fields in this structure
provide means to filter out the kind of changes in the control flow
(branches, traps, rfi, ...) that get recorded in the ETB. Each one
represents an element of the branch architecture of the Itanium 2 9000
(Montecito) processor. Refer to the Itanium 2 9000 (Montecito)
specific documentation for more details on the branch architecture. The
fields are as follows:
etb_tm If this field is 0, then no branch is captured. If this field is
1, then non taken branches are captured. If this field is 2,
then taken branches are captured. Finally if this field is 3
then all branches are captured.
etb_ptm
If this field is 0, then no branch is captured. If this field is
1, then branches with a mispredicted target address are
captured. If this field is 2, then branches with correctly
predicted target address are captured. Finally if this field is
3 then all branches are captured regardless of target address
prediction.
etb_ppm
If this field is 0, then no branch is captured. If this field is
1, then branches with a mispredicted path (taken/non taken) are
captured. If this field is 2, then branches with correctly
predicted path are captured. Finally if this field is 3 then all
branches are captured regardless of their path prediction.
etb_brt
If this field is 0, then no branch is captured. If this field is
1, then only IP-relative branches are captured. If this field is
2, then only return branches are captured. Finally if this field
is 3 then only non-return indirect branches are captured.
etb_plm
This is the privilege level mask at which the ETB captures
branches. It can be any combination of PFM_PLM0, PFM_PLM1,
PFM_PLM2, PFM_PLM3. If etb_plm is 0 then the default privilege
level mask in pfp_dfl_plm is used.
There are 4 methods to program the ETB and they are as follows:
Method 1
The ETB_EVENT is in the list of event to monitor and etb_used is
cleared. In this case, the ETB will be configured (PMC39) to
record ALL branches. A counting monitor will be programmed to
count ETB_EVENT.
Method 2
The ETB_EVENT is in the list of events to monitor and etb_used
is set. In this case, the BTB will be configured (PMC39) using
the information in the pfp_mont_etb structure. A counting
monitor (PMC4/PMD4-PMC7/PMD7) will be programmed to count
BRANCH_EVENT.
Method 3
The ETB_EVENT is not in the list of events to monitor and
etb_used is set. In this case, the ETB will be configured
(PMC39) using the information in the pfp_mont_etb structure.
This is the free running mode for the ETB.
Method 4
The ETB_EVENT is not in the list of events to monitor and
etb_used is cleared. In this case, the ETB is not programmed.
DATA AND CODE RANGE RESTRICTIONS
The pfp_mont_drange and pfp_mont_irange fields control the range
restrictions for the data and code respectively. The idea is that the
application passes a set of ranges, each designated by a start and end
address. Upon return from pfm_dispatch_events(), the application gets
back the set of registers and their values that needs to be programmed
via a kernel interface.
Range restriction is implemented using the debug registers. There is a
limited number of debug registers and they go in pair. With 8 data
debug registers, a maximum of 4 distinct ranges can be specified. The
same applies to code range restrictions. Moreover, there are some
severe constraints on the alignment and size of the ranges. Given that
the size of a range is specified using a bitmask, there can be
situations where the actual range is larger than the requested range.
For code ranges, Itanium 2 9000 (Montecito) processor can use what is
called a fine mode, where a range is designated using two pairs of code
debug registers. In this mode, the bitmask is not used, the start and
end addresses are directly specified. Not all code ranges qualify for
fine mode, the size of the range must be 64KB or less and the range
cannot cross a 64KB page boundary. The library will make a best effort
in choosing the right mode for each range. For code ranges, it will try
the fine mode first and will default to using the bitmask mode
otherwise. Fine mode applies to all code debug registers or none, i.e.,
you cannot have a range using fine mode and another using the bitmask.
The Itanium 2 9000 (Montecito) processor somehow limits the use of
multiple pairs to accurately cover a code range. This can only be done
for IA64_INST_RETIRED and even then, you need several events to collect
the counts. For all other events, only one pair can be used, which
leads to more inaccuracy due to approximation. Data ranges can used
multiple debug register pairs to gain more accuracy. The library will
never cover less than what is requested. The algorithm will use more
than one pair of debug registers whenever possible to get a more
precise range. Hence, up to the 4 pairs can be used to describe a
single range.
If range restriction is to be used, the rr_used field must be set to
one, otherwise settings will be ignored. The ranges are described by
the pfmlib_mont_input_rr_t structure. Up to 4 ranges can be defined.
Each range is described in by a entry in rr_limits. Some flags for all
ranges can be defined in rr_flags. Currently defined flags are:
PFMLIB_MONT_RR_INV
Inverse the code ranges. The qualifying events will be
measurement when executing outside the specified ranges.
PFMLIB_MONT_RR_NO_FINE_MODE
Force non fine mode for all code ranges (mostly for debug)
The pfmlib_mont_input_rr_desc_t structure is defined as follows:
rr_plm The privilege level at which the range is active. It can be any
combinations of PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If
etb_plm is 0 then the default privilege level mask in
pfp_dfl_plm is used. The privilege level is only relevant for
code ranges, data ranges ingores the setting.
rr_start
This is the start address of the range. Any address is supported
but for code range it must be bundle aligned, i.e., 16-byte
aligned.
rr_end This is the end address of the range. Any address is supported
but for code range it must be bundle aligned, i.e., 16-byte
aligned.
The library will provide the values for the debug registers as well as
some information about the actual ranges in the output parameters and
more precisely in the pfmlib_mont_output_rr_t structure for each range.
The structure is defined as follows:
rr_nbr_used
Contains the number of debug registers used to cover the range.
This is necessarily an even number as debug registers always go
in pair. The value of this field is between 0 and 7.
rr_br This table contains the list of debug registers necessary to
cover the ranges. Each element is of type pfmlib_reg_t. The
reg_num field contains the debug register index while reg_value
contains the debug register value. Both the index and value must
be copied into the kernel specific argument to program the debug
registers. The library never programs them.
rr_infos
Contains information about the ranges defined. Because of
alignment restrictions, the actual range covered by the debug
registers may be larger than the requested range. This table
describe the differences between the requested and actual ranges
expressed as offsets:
rr_soff
Contains the start offset of the actual range described by the
debug registers. If zero, it means the library was able to match
exactly the beginning of the range. Otherwise it represents the
number of byte by which the actual range preceeds the requested
range.
rr_eoff
Contains the end offset of the actual range described by the
debug registers. If zero, it means the library was able to match
exactly the end of the range. Otherwise it represents the number
of bytes by which the actual range exceeds the requested range.
IP EVENT CAPTURE (IP-EAR)
The Execution Trace Buffer (ETB) can be configured to record the
addresses of consecutive retiring instructions. In this case the ETB
contains IP addresses and not branches related information. This
feature cannot be used in conjunction with regular branche captures as
described above. To active this feature the ipear_used field of the
pfmlib_mont_ipear_t must be set to 1. The other fields in this
structure are used as follows:
ipear_plm
The privilege level of the instructions to capture. It can be
any combination of PFM_PLM0, PFM_PLM1, PFM_PLM2, PFM_PLM3. If
etb_plm is 0 then the default privilege level mask in
pfp_dfl_plm is used.
ipear_delay
The number of cycles by which to delay the freeze of the ETB
after a PMU interrupt (which freeze the rest of counters).
ERRORS
Refer to the description of pfm_dispatch_events() for errors when using
the Itanium 2 9000 (Montecito) specific input and output arguments.
SEE ALSO
pfm_dispatch_events(3) and set of examples shipped with the library
AUTHOR
Stephane Eranian <eranian@hpl.hp.com>
November, 2003 LIBPFM(3)