Profiling
Profiling
GMS includes in-code profilers, which enables the user to augment his code for gathering performance data. Profiling is directly embedded in the build pipeline and can be enabled/disabled using flags in the cmake files. All calls to to the profiling namespace are then turned into no-ops.
At the moment only PAPI (hardware performance counters) is included in the release code. More will follow in the future.
PAPI
We include PAPI using our own PAPIW (PAPI wrapper), a Header-only library that simplifies the use of PAPI, especially when using Openmp.
Include
include <gms/common/papi/papiw.h> in your GMS subproject file to inject the PAPIW namespace.
Then activate PAPI measurement by providing PAPIW to the gms_benchmark cmake function, something like:
gms_benchmark(maximal_clique_enum_bron_kerbosch.cc PAPIW)If the flag is not provided for compilation, than any call to PAPIW in code is turned into a No-op.
Usage
Initialization (Supports variadic Papi eventcode arguments):
// Use eitherPAPIW::INIT_SINGLE(PAPI_L2_TCA, PAPI_L3_TCA); // Init PAPIW for sequential use only// OrPAPIW::INIT_PARALLEL(PAPI_L2_TCA, PAPI_L3_TCA); // Init PAPIW for parallel useBenchmarking:
PAPIW::START();doSomethingInteressting();PAPIW::STOP();
doSomethingUnimportant(); // Do not measure
PAPIW::START();doSomethingInteresstingAgain();PAPIW::STOP();Benchmarking parallel regions:
// This setting should be preferred#pragma omp parallel{ PAPIW::START(); doSomethingInteressting(); PAPIW::STOP();}
// But this also worksPAPIW::START();#pragma omp parallel{ doSomethingInteressting();}PAPIW::STOP();Resetting:
PAPIW::RESET(); // Set the intermediate counter values to zeroPrinting:
PAPIW::PRINT();The output could look something like:
PAPIW Parallel PapiWrapper instance report:PAPI_L2_TCA (L2 total cache accesses): 68743998PAPI_TOT_CYC (Total cycles executed): 9800773029PAPI_L3_TCA (L3 total cache accesses): 32864360PAPI_L3_TCM (Level 3 total cache misses): 17234237@%% PAPI_L2_TCA PAPI_TOT_CYC PAPI_L3_TCA PAPI_L3_TCM@%@ 68743998 9800773029 32864360 17234237Info
- If Papi is not available on the system, most code will not get compiled and any call to
PAPIWis turned into a No-op. The same effect can be achieved by settingNOPAPIWfor building - Since
PAPIWneeds threadprivate states and Papi itself needs to be refreshed whenever an underlying kernel LWP was killed, one should stop and start between different parallel regions, whenever possible. PAPIW::START()assigns and starts the counter to the threads. The number of threads is the current opm team size- If
omp_set_num_threadsis used,PAPIW::STOP()has to be called right before. Certainly,PAPIW::START()may be called immediately afterwards. - Assuming
PAPIWwas initialized usingINIT_PARALLEL, it can be started and stopped inside a parallel region or outside. It will always use the omp team size based on a call toomp_get_num_threadsin a parallel region. - Whenever possible,
PAPIW:START()andPAPIW::STOP()should be called directly inside one parallel region PAPIW::INIT_SINGLEandPAPIW::INIT_PARALLELmay not be called inside a parallel regionPAPIW::RESETandPAPIW::PRINTmay not be called while the counters are still running- If an event, which is not available on the system, is added in
PAPIW::INIT, then only a warning is displayed and the program continues. Of course no data can be gathered and hence, no output for that specific event is printed out - A lot of state checks are used for
PAPIW. In the event of an invalid state, the program aborts and a human-readable error message is printed out - The output is optimized for easy extraction, e.g. for some plotting programs:
- First all observed counters are displayed with a short description and their measured values
- Then
@%%indicates the header (Papi Counter name) - And
@%@indicates the values
Dos and Don'ts
Stop before changing the current teamsize
PAPIW::STOP();omp_set_num_threads(4);PAPIW::START();Although it should work, don't use the following
#pragma omp parallel{ PAPIW::START(); doSomethingInteressting();}PAPIW::STOP();Following is nonsense:
#pragma omp parallel forfor(int i = 0; i<n; i++){PAPIW::START();doSomethingInteressting();PAPIW::STOP();}Limitations
- Was created with OpenMp in mind
omp_set_dynamicshould be false. Otherwise make sure thatPAPIW:START()andPAPIW:STOP()are only used inside Parallel regions