GraphMineSuite

Changing Set Operations

In this example we show how the set abstraction allows to change details of the set representation and operations, without changing the actual algorithm implementation.

Introduction

A set-based implementation of total triangle counting is given by the following code:

```.css-1yeqbym{position:absolute;top:0;right:0;z-index:1;border-radius:0 5px 0 5px;padding:0.25rem 0.6rem;border:none;cursor:pointer;background:#44475a;color:rgb(248, 248, 242);-webkit-transition:all 200ms ease;transition:all 200ms ease;font-size:12px;}.css-1yeqbym:disabled{cursor:not-allowed;}.css-1yeqbym:not(:disabled):hover,.css-1yeqbym:not(:disabled):focus{background:#10b981;}```template<class SGraph>size_t count_total(const SGraph &graph) {    size_t n = graph.num_nodes();    size_t total = 0;
for (NodeId u = 0; u < n; ++u) {        const auto &neigh_u = graph.neigh(u);        for (NodeId v : neigh_u) {            if (u < v) {                total += neigh_u.intersect_count(graph.neigh(v));            }        }    }

Here, SGraph refers to a class implementing a SetGraph interface, in general this is `SetGraph` instantiated with a particular set class. For example one can use `RoaringGraph` for this code.

As we can see the only set operation which is used by this algorithm is `intersect_count` and iterator access methods `begin()`/`end()` are used implicitly by the inner for loop.

Simple set class

The full interface of set classes can be found in the definitions of `RoaringSet`, `SortedSet`, etc.

Here, we're concerned with a minimal example and provide the following set class which is implemented in terms of unsorted vectors:

``````class SimpleSet {    using SetElement = NodeId;    std::vector<SetElement> data;public:    SimpleSet(const SetElement *data, size_t count) : data(data, data + count) {}
auto begin() const { return data.begin(); }    auto end() const { return data.end(); }
bool contains(SetElement element) const {        for (NodeId u : *this) {            if (u == element) {                return true;            }        }        return false;    }
size_t intersect_count(const SimpleSet &set) const {        size_t count = 0;        for (NodeId u : *this) {            count += set.contains(u);        }        return count;    }};``````

As we can see, this implements a relatively inefficient linear search over the other set. The graph class we use now would be `SetGraph<SimpleSet>`.

A note on parallelization

A parallelized version of the above algorithm also exists and can be found in `gms/algorithms/set_based/triangle_count/parallel/total.h`. Parallelization is done by adding a `#pragma omp parallel for` to the outermost loop of the algorithm. At this time, we don't usually parallelize individual set operations, but rather the algorithm invoking individual set operations, however in future versions of GMS this might change and you are free to experiment with different parallelization schemes.

Optimizing the intersect_count operation

As mentioned before, this is a rather simplistic and inefficient implementation of `intersect_count`. Another version, in-line with our `SortedSet` implementation first sorts the neighborhood information to allow for a faster `intersect_count` implementation.

``````class BetterSet : public SimpleSet {public:    BetterSet(const SetElement *data, size_t count) : SimpleSet(data, count) {        std::sort(this->data.begin(), this->data.end());    }
size_t intersect_count(const SimpleSet &set) const {        size_t count = 0;        auto it0 = begin();        auto it1 = set.begin();        while (it0 != end() && it1 != set.end()) {            if (*it0 == *it1) {                ++count;                ++it0;                ++it1;            } else if (*it0 > *it1) {                ++it1;            } else {                ++it0;            }        }        return count;    }};``````

Checking for correctness

While we could now directly run the benchmark with these two different set implementations, we can first run a unittest to check whether our implementation is indeed correct. For this open `gms/testing/sets.cpp`, include your new files and add it to the `Implementations` definition:

``using Implementations =    testing::Types<        RoaringSet,        SortedSetBase<std::int32_t>,        ...,        SimpleSet,        BetterSet    >;``

For the sake of this example we provide this together in the examples folder.

Setting up the benchmark

To compare the performance of these two different implementation, we add the following main function which instantiates the benchmark code:

``````using namespace GMS;
using SimpleGraph = SetNeighborhoodGraph<SimpleSet>;using BetterGraph = SetNeighborhoodGraph<BetterSet>;
int main(int argc, char *argv[]) {    auto [args, g] = CLI::Parser().parse_and_load(argc, argv);
BenchmarkKernelBk<SimpleGraph>(args, g, TriangleCount::Seq::count_total<SimpleGraph>, TriangleCount::Verify::total_count, "SimpleGraph");
BenchmarkKernelBk<BetterGraph>(args, g, TriangleCount::Seq::count_total<BetterGraph>, TriangleCount::Verify::total_count, "BetterGraph");}``````

To compile a benchmark, you need add the directory containing the benchmark with an `add_subdirectory(dir)` call in the right `CMakeLists.txt` above in the hierarchy. In the directory where the benchmark code is, you can add a `CMakeLists.txt` file with the content:

``gms_benchmark(mybench.cpp)``

which will define a target `mybench`.

Run the benchmarks

Now we can run the benchmarks.

To run it five times each for a graph of size `2^10` with uniformly distributed degree, invoke

``./bin/mybench -g uniform 10 -n 5``