What is Pregelix?

Pregelix is an open-source implementation of the bulk-synchronous vertex-oriented programming model (Pregel API) for large-scale graph analytics. Pregelix is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads efficiently. Pregelix supports a variety of physical runtime choice which can fit different sorts of graph algorithms, datasets, and clusters. Our motto is "Big Graph Analytics Anywhere!" --- from a single laptop to large enterprise clusters.

Quick example:

  public class PageRankVertex extends Vertex<VLongWritable, DoubleWritable, FloatWritable, 
		DoubleWritable> {
    public void compute(Iterator<DoubleWritable> msgIterator) {
        sum = 0;
        while (msgIterator.hasNext()) {
          sum += msgIterator.next().get();
        setVertexValue((0.15 / getNumVertices()) + 0.85 * sum);
        sendMsgToAllNeighbors(vertexValue / getEdges().size());

The above code is the PageRank implementation on Pregelix.


We have compared Pregelix with several popular Big Graph Analytics platforms, including Giraph, Hama, GraphLab, and GraphX. The figures below demonstrate the performance (i.e., end-to-end execution time and average iteration time) of the single source shortest paths algorithm (SSSP) on a 32-machine cluster using different platforms. Here contains more details.

Pregelix can perform comparably to Giraph for memory-resident message-intensive workloads (like PageRank), can outperform Giraph by 15x for memory-resident message-sparse workloads (like the single source shortest paths algorithm), can scale to out-of-core workloads, can sustain multi-user workloads, and can outerpform GraphLab, GraphX, and Hama by more than an order of magnitude for various workloads. Checkout more details here.