[PhD Proposal] Thao Nguyen Pham: Exploiting the Synergy between Scheduling and Load Shedding to Facilitate Differentiated Levels of Service for Continuous Queries

Scalable Processing of Multiple Aggregate Continuous Queries
Thao Nguyen Pham
Monday December 10, 2012
1:30 pm - SENSQ 6106 Eli Lilly Room

ABSTRACT

Today the ubiquity of sensing devices as well as mobile and monitoring applications generates a huge amount of data which takes the form of streams. These high-volume, often high-speed and bursty, data streams need to be continuously analyzed to meet the near-real time requirements of the monitoring applications and of the emerging "Big Data" applications. Data stream management systems (DSMSs) have become a popular solution to handle data streams by efficiently executing continuous queries (CQs) over the incoming data. The efficiency with which a DSMS services a CQ is measured in terms of the quality of service (QoS), i.e. the processing response time, and the quality of data (QoD), i.e., the accuracy of the results.
In general, CQs within an application inherently have different levels of criticality and hence different levels of expected QoS and QoD. Adhering to such expected QoS/QoD metrics is also important in cases of multi-tenant data stream management services, which provide different service level agreements to different client stream applications. Previous works have partially addressed these requirements through both scheduling and load shedding, yet these were considered in isolation. The challenge of how to integrate scheduling and load shedding in a way to consistently honor the priorities of different CQs still remains, forming the target of this proposed thesis.
In this thesis, we propose a framework that allows seamless integration of priority-based scheduler and load shedder, which can incorporate different scheduling and load shedding policies. Our preliminary implementation on AQSIOS, our real DSMS prototype, shows that the proposed framework supports consistent and practical policies to enforce CQ's priorities, while at the same time improving system utilization by increasing the amount of batch processing. We expect that the framework can be extended to fully support operator sharing, to deploy on multiple nodes for a large-scale system, as well as to include priority-based memory management, which we plan to investigate to complete this thesis.

DISSERTATION ADVISER

Dr. Dr. Panos Chrysanthis, Dr. Alexandros Labrinidis, Department of Computer Science

COMMITTEE MEMBERS

Dr. Adam Lee, Department of Computer Science
Dr. Alexandros Labrinidis, Department of Computer Science
Dr. Panos Chrysanthis, Department of Computer Science
Dr. Christos Faloutsos, Department of Computer Science, Carnegie Mellon University