Profiling
Profiling
GMS includes in-code profilers, which enables the user to augment his code for gathering performance data. Profiling is directly embedded in the build pipeline and can be enabled/disabled using flags in the cmake
files. All calls to to the profiling namespace are then turned into no-ops.
At the moment only PAPI
(hardware performance counters) is included in the release code. More will follow in the future.
PAPI
We include PAPI
using our own PAPIW
(PAPI wrapper), a Header-only library that simplifies the use of PAPI
, especially when using Openmp.
Include
include <gms/common/papi/papiw.h>
in your GMS
subproject file to inject the PAPIW
namespace.
Then activate PAPI
measurement by providing PAPIW
to the gms_benchmark
cmake function, something like:
gms_benchmark(maximal_clique_enum_bron_kerbosch.cc PAPIW)
If the flag is not provided for compilation, than any call to PAPIW
in code is turned into a No-op.
Usage
Initialization (Supports variadic Papi eventcode arguments):
// Use eitherPAPIW::INIT_SINGLE(PAPI_L2_TCA, PAPI_L3_TCA); // Init PAPIW for sequential use only// OrPAPIW::INIT_PARALLEL(PAPI_L2_TCA, PAPI_L3_TCA); // Init PAPIW for parallel use
Benchmarking:
PAPIW::START();doSomethingInteressting();PAPIW::STOP();
doSomethingUnimportant(); // Do not measure
PAPIW::START();doSomethingInteresstingAgain();PAPIW::STOP();
Benchmarking parallel regions:
// This setting should be preferred#pragma omp parallel{ PAPIW::START(); doSomethingInteressting(); PAPIW::STOP();}
// But this also worksPAPIW::START();#pragma omp parallel{ doSomethingInteressting();}PAPIW::STOP();
Resetting:
PAPIW::RESET(); // Set the intermediate counter values to zero
Printing:
PAPIW::PRINT();
The output could look something like:
PAPIW Parallel PapiWrapper instance report:PAPI_L2_TCA (L2 total cache accesses): 68743998PAPI_TOT_CYC (Total cycles executed): 9800773029PAPI_L3_TCA (L3 total cache accesses): 32864360PAPI_L3_TCM (Level 3 total cache misses): 17234237@%% PAPI_L2_TCA PAPI_TOT_CYC PAPI_L3_TCA PAPI_L3_TCM@%@ 68743998 9800773029 32864360 17234237
Info
- If Papi is not available on the system, most code will not get compiled and any call to
PAPIW
is turned into a No-op. The same effect can be achieved by settingNOPAPIW
for building - Since
PAPIW
needs threadprivate states and Papi itself needs to be refreshed whenever an underlying kernel LWP was killed, one should stop and start between different parallel regions, whenever possible. PAPIW::START()
assigns and starts the counter to the threads. The number of threads is the current opm team size- If
omp_set_num_threads
is used,PAPIW::STOP()
has to be called right before. Certainly,PAPIW::START()
may be called immediately afterwards. - Assuming
PAPIW
was initialized usingINIT_PARALLEL
, it can be started and stopped inside a parallel region or outside. It will always use the omp team size based on a call toomp_get_num_threads
in a parallel region. - Whenever possible,
PAPIW:START()
andPAPIW::STOP()
should be called directly inside one parallel region PAPIW::INIT_SINGLE
andPAPIW::INIT_PARALLEL
may not be called inside a parallel regionPAPIW::RESET
andPAPIW::PRINT
may not be called while the counters are still running- If an event, which is not available on the system, is added in
PAPIW::INIT
, then only a warning is displayed and the program continues. Of course no data can be gathered and hence, no output for that specific event is printed out - A lot of state checks are used for
PAPIW
. In the event of an invalid state, the program aborts and a human-readable error message is printed out - The output is optimized for easy extraction, e.g. for some plotting programs:
- First all observed counters are displayed with a short description and their measured values
- Then
@%%
indicates the header (Papi Counter name) - And
@%@
indicates the values
Dos and Don'ts
Stop before changing the current teamsize
PAPIW::STOP();omp_set_num_threads(4);PAPIW::START();
Although it should work, don't use the following
#pragma omp parallel{ PAPIW::START(); doSomethingInteressting();}PAPIW::STOP();
Following is nonsense:
#pragma omp parallel forfor(int i = 0; i<n; i++){PAPIW::START();doSomethingInteressting();PAPIW::STOP();}
Limitations
- Was created with OpenMp in mind
omp_set_dynamic
should be false. Otherwise make sure thatPAPIW:START()
andPAPIW:STOP()
are only used inside Parallel regions