[News] EDBT 2016 Paper

The results of our NSF-funded collaboration with Mechanical Engineering & Materials Science colleagues are report in the paper titled PARAGON: Parallel Architecture-Aware Graph Partition Refinement Algorithm authored by Angen Zheng, Alexandros Labrinidis, Patrick H. Pisciuneri, Panos K. Chrysanthis, and Peyman Givi. Published in the 19th International Conference on Extending Database Technology, March 15-18, 2016.

With the explosion of large, dynamic graph datasets from various fields, graph partitioning and repartitioning are becoming more and more critical to the performance of many graph-based Big Data applications, such as social analysis, web search, and recommender systems. However, well-studied graph (re)partitioners usually assume a homogeneous and contention-free computing environment, which contradicts the increasing communication heterogeneity and shared resource contention in modern, multicore high performance computing clusters. To bridge this gap, we introduce PARAGON, a parallel architecture-aware graph partition refinement algorithm, which mitigates the mismatch by modifying a given decomposition according to the nonuniform network communication costs and the contentiousness of the underlying hardware topology. To further reduce the overhead of the refinement, we also make PARAGON itself architecture-aware. Our experiments with a diverse collection of datasets showed that on average PARAGON improved the quality of graph decompositions computed by the de-facto standard (hashing partitioning) and two state-of-the-art streaming graph partitioning heuristics (deterministic greedy and linear deterministic greedy) by 43%, 17%, and 36%, respectively. Furthermore, our experiments with an MPI implementation of Breadth First Search and Single Source Shortest Path showed that, in comparison to the state-of-the-art streaming and multi-level graph (re)partitioners, PARAGON achieved up to 5.9x speedups. Finally, we demonstrated the scalability of PARAGON by scaling it up to a graph with 3.6 billion edges using only 3 machines (60 physical cores).