GraphMineSuite

Set representations and operations

Set representations and operations

GMS follows a modular approach to graph representations and set operations, so that these can be experimented with easily without modifying the target algorithm implementation itself.

Available implementations

We provide several set representations already:

  • RoaringSet: Wrapper for the Roaring bitmap library, which implements compressed bit sets.
  • SortedSet: Implementation of a set as a sorted vector.
  • RobinHoodSet: Wrapper for the robin_hood library, which implements a hash table.

RoaringSet is considered the default set type and best choice in many situations. These classes encapsulate the set operations, while templates are used to make algorithm implementations generic over the actual set implementation.

Set interface

The set interface consists of several constructors, basic set-algebra operations, and a couple of general functions. This interface is one of the main building blocks for set-based GMS algorithms.

In the following sections we will describe the set interface for a class Set where SetElement is usually GMS::NodeId, where some set classes also support different types.

Constructors

The main constructor which should always be present, since it's required during construction of SetGraph instances:

  • Set(const SetElement *start, size_t count)

A number of constructors which may be used depending on the algorithms used:

  • Set(): default constructor for an empty set
  • Set(Set &&): move constructor
  • Set(std::vector<SetElement> &vector): from vector constructor

Convenience constructors:

  • Set(SetElement): set with only a single element
  • Set(std::initializer_list<SetElement> &data): set initialization with initializer list of elements
  • static Set Range(int bound): create set {0, 1, …, bound - 1}.

Set algebra methods

The following functions provide set algebra operations:

  • Set difference(const Set &) const
  • Set difference(SetElement) const
  • void difference_inplace(const Set &) const
  • void difference_inplace(SetElement) const
  • Set intersect(const Set &) const
  • size_t intersect_count(const Set &) const
  • void intersect_inplace(const Set &)
  • Set union_with(const Set &) const
  • Set union_with(SetElement) const
  • Set union_count(const Set &) const
  • void union_inplace(const Set &)
  • void union_inplace(SetElement)
  • bool contains(SetElement) const
  • void add(SetElement)
  • void remove(SetElement)

These function names are fully descriptive of the set algebra operations, _inplace functions modify the target object, while those without return a new instance. The methods add and remove are supposed to be the same functions as difference_inplace and union_inplace for single SetElement.

General methods

  • begin() const, end() const: return iterator to neighborhood indices
  • Set clone() const: return a copy of the set, this method exists, since the copy constructor is disabled for sets to avoid accidental copying
  • void toArray(int32_t *array) const: write the set indices to the array, useful for parallel iteration
  • operator==, operator!=: set equality/inequality comparison
  • size_t cardinality() const: returns the number of elements in the set

Writing set-generic code

GMS uses templates extensively to make code generic over the actual set type. Some functions are simply generic over a Set type. However, the main way how it is used in practice is by code that is generic over an instance of the SetGraph class template, which represents a graph as a vector of sets where every set represents the neighborhood vertex IDs of a particular vertex:

template <class TSet>
class SetGraph {
public:
using Set = TSet;
const Set &out_neigh(NodeId node) const;
int64_t out_degree(NodeId node) const;
int64_t num_nodes() const;
// Some functions omitted.
};

The following example from algorithms/set_based/maximal_clique_enum/parallel/eppsteinPAR.h shows how a function signature would look like in practice:

template <class SGraph, class Set = typename SGraph::Set>
std::vector<Set> mceBench(const SGraph &rgraph, const pvector<NodeId> &ordering)
{
// implementation omitted
}

This function is generic over SGraph and could be used for example with SetGraph<RoaringSet>. The second template parameter Set is set to default to the Set type associated with the SGraph. Inside of the function, new temporary sets are created by using this Set type. Also, in debug builds this function will return a vector of the found maximal cliques which also be represented by Set each.

For convenience we provide several default aliases for SetGraph with the default set implementations:

using RoaringGraph = SetGraph<RoaringSet>;
using SortedSetGraph = SetGraph<SortedSet>;
using RobinHoodGraph = SetGraph<RobinHoodSet>;

Creating a SetGraph instance

To create a SetGraph instance you first need a CSRGraph or a compatible graph, see Graph representations for more information. Then you can simply call the factory function FromCGraph like this:

auto graph = SetGraph<RoaringSet>::FromCGraph(csr_graph);

It also provides a second template parameter, RemoveIsolated which is set to false by default but can be enabled to detect empty neighborhoods and relabel the graph with these nodes removed.