Concept-Driven Load Shedding: Reducing Size and Error of Voluminous and Variable Data Streams
DocUID: 2018-008 Full Text: PDFAuthor: Nikos R. Katsipoulakis, Alexandros Labrinidis, Panos K. Chrysanthis
Abstract: Load shedding is a technique that aims to ameliorate the consequences of the Velocity and the Volume of Big Data stream processing. When temporal input spikes appear, tuples are shed until a Stream Processing Engine's (SPE) processing capacity is not overwhelmed and results are produced in a timely fashion. Existing load shedding techniques have become obsolete and are not applicable to modern use-cases which require the extraction of patterns from continuously evolving (i.e., Variable) voluminous streams.In this work, we identify the shortcomings of existing load shedding techniques when applied to streams with concept drift. We propose Concept-Driven load shedding (CoD), which aims at limiting the data volume imposed on the SPE while producing high accuracy results. On top of that, we designed CoD for modern SPEs and made its overhead negligible. Our experiments indicate that CoD can deliver more than 10x more accurate results compared to the state of the art in load shedding. Also, CoD can offer up to 2.25× better performance compared to normal processing and reduce the processed data volume significantly.
Keywords: Data Streams, Data Aggregation, Approximate Continuous Queries
Published In: IEEE BigData 2018
Pages: 418-427
Year Published: 2018
DOI: 10.1109/BigData.2018.8622265
Project: PittSmartLiving Subject Area: Data Aggregation, Data Streams
Publication Type: Conference Paper
Sponsor: NSF CNS-1739413