|Congratulations to Angen and Xiaoyu for their IEEE BigData 2016 papers: ARGO: Architecture-Aware Graph Partitioning and REQUEST: A Scalable Framework for Interactive Construction of Exploratory Queries|
Title: ARGO: Architecture-Aware Graph Partitioning
Abstract: The increasing popularity and ubiquity of various large graph datasets has caused renewed interest for graph partitioning. Existing graph partitioners either scale poorly against large graphs or disregard the impact of the underlying hardware topology. A few solutions have shown that the nonuniform network communication costs may affect the performance greatly. However, none of them considers the impact of resource contention on the memory subsystems (e.g., LLC and Memory Controller) of modern multicore clusters. They all neglect the fact that the bandwidth of modern high-speed networks (e.g., Infiniband) has become comparable to that of the memory subsystems. In this paper, we provide an in-depth analysis, both theoretically and experimentally, on the contention issue for distributed workloads. We found that the slowdown caused by the contention can be as high as 11x. We then design an architecture-aware graph partitioner, ARGO , to allow the full use of all cores of multicore machines without suffering from either the contention or the communication heterogeneity issue. Our experimental study showed (1) the effectiveness of ARGO , achieving up to 12x speedups on three classic workloads: Breadth First Search, Single Source Shortest Path, and PageRank; and (2) the scalability of ARGO in terms of both graph size and the number of partitions on two billion-edge real-world graphs.
Title: REQUEST: A Scalable Framework for Interactive Construction of Exploratory Queries
Abstract: Exploration over large datasets is a key first step in data analysis, as users may be unfamiliar with the underlying database schema and unable to construct precise queries that represent their interests. Such data exploration task usually involves executing numerous ad-hoc queries, which requires a considerable amount of time and human effort. In this paper, we present REQUEST, a novel framework that is designed to minimize the human effort and enable both effective and efficient data exploration. REQUEST supports the query-from-examples style of data exploration by integrating two key components: 1) Data Reduction, and 2) Query Selection. As instances of the REQUEST framework, we propose several highly scalable schemes, which employ active learning techniques and provide different levels of efficiency and effectiveness as guided by the user's preferences. Our results, on real-world datasets from Sloan Digital Sky Survey, show that our schemes on average require 1-2 orders of magnitude fewer feedback questions than the random baseline, and 3-16$\times$ fewer questions than the state-of-the-art, while maintaining interactive response time. Moreover, our schemes are able to construct, with high accuracy, queries that are often undetectable by current techniques.