What is Pregelix?
Pregelix is an open-source implementation of the bulk-synchronous vertex-oriented programming model (Pregel API) for large-scale graph analytics. Pregelix is based on an iterative dataflow design that is better tuned to handle both in-memory and out-of-core workloads efficiently. Pregelix supports a variety of physical runtime choice which can fit different sorts of graph algorithms, datasets, and clusters. Our motto is "Big Graph Analytics Anywhere!" --- from a single laptop to large enterprise clusters.
Quick example:
public class PageRankVertex extends Vertex<VLongWritable, DoubleWritable, FloatWritable, DoubleWritable> { ........ @Override public void compute(Iterator<DoubleWritable> msgIterator) { ....... sum = 0; while (msgIterator.hasNext()) { sum += msgIterator.next().get(); } setVertexValue((0.15 / getNumVertices()) + 0.85 * sum); sendMsgToAllNeighbors(vertexValue / getEdges().size()); .... } }
The above code is the PageRank implementation on Pregelix.
Performance:
We have compared Pregelix with several popular Big Graph Analytics platforms, including Giraph, Hama, GraphLab, and GraphX. The figures below demonstrate the performance (i.e., end-to-end execution time and average iteration time) of the single source shortest paths algorithm (SSSP) on a 32-machine cluster using different platforms. Here contains more details.
Pregelix can perform comparably to Giraph for memory-resident message-intensive workloads (like PageRank), can outperform Giraph by 15x for memory-resident message-sparse workloads (like the single source shortest paths algorithm), can scale to out-of-core workloads, can sustain multi-user workloads, and can outerpform GraphLab, GraphX, and Hama by more than an order of magnitude for various workloads. Checkout more details here.