• Slide 1
  • Slide 2
  • Slide 3

AIM & SCOPE

During the last forty years, data management systems have grown in scale, complexity, and number of installations. At the same time, administration of these systems has become very expensive with the human factor dominating the total cost of ownership. Current trends like cloud computing make this situation even more problematic for service providers who have to configure and manage thousands of database nodes.
There has been a significant amount of research addressing this problem by providing autonomic or self-* features in database systems to support complex administrative tasks like physical database design, problem diagnosis, and performance tuning. However, new challenges arise from trends like cloud and cluster computing, virtualization, and Software-as-a-Service (SaaS). A major challenge is the need to scale self-management capabilities to the level of hundreds to thousands of nodes while taking economic factors into account.
Autonomic, or self-managing, systems are a promising approach to achieve the goal of systems that are easier to use and maintain. A system is considered to be autonomic if it possesses the capabilities to be self-configuring, self-optimizing, self-healing and self-protecting. The aim of the SMDB workshop is to provide a forum for researchers from both industry and academia to present and discuss ideas related to self-management and self-organization in data management systems ranging from classical databases to data stream engines to large-scale cloud environments that utilize advanced AI, machine learning, and data mining and analysis.
We plan to follow the successful format of previous instances of this workshop: approximately 10 presentations of accepted papers, a keynote address by a well-known speaker and subject matter expert in self-managing database systems, as well as a panel discussion involving experts from industry and academia.

CALL FOR PAPERS


Download CFP

TOPICS OF INTEREST


Topics of interest include, but are not limited to:

* Principles and architecture of autonomic data management systems
* Retro-fitting existing systems vs. designing for self management
* Self-* capabilities in databases and storage systems
* Data management in cloud and multi-tenant databases
* Autonomic capabilities in database-as-a-service platforms
* Automated testing of data management systems
* Automated physical database design and adaptive query tuning
* Automated provisioning and integration
* Automatic enforcement of information quality
* Robust query processing techniques
* Self-managing data stream engines and adaptive event-based systems
* Self-managing distributed / decentralized / peer-to-peer information systems
* Self-management of internet-scale distributed systems
* Self-management for big data infrastructures
* Monitoring and diagnostics in data management systems
* Policy automation and visualization for datacenter administration
* User acceptance and trust of autonomic capabilities
* Evaluation criteria and benchmarks for self-managing systems
* Self-evaluation of data management services in the cloud
* Use cases and war stories on deploying autonomic capabilities


SUBMISSION GUIDELINES

Authors are invited to submit original research contributions in English of up to 6 pages in the IEEE camera-ready format (templates are available at the ICDE 2021 submission guidelines page) to the submission site https://cmt3.research.microsoft.com/SMDB2021. Authors of accepted papers will be encouraged to submit an extended paper of up to 8 pages for final publication. Author are also invited to submit short papers up to 4 pages. The page limit includes the bibliography and any appendix. All accepted papers will appear in the formal Proceedings of the Conference Workshops published by IEEE CS Press, and will be included in the IEEE digital library.

Authors of a selection of accepted papers will be invited to submit an extended version to the Distributed and Parallel Databases (DAPD) journal.

Submit

Paper submission deadline:

January 11 January 31, 2021 5pm PST (abstract) (optional)
January 18 January 31, 2021 5pm PST

Notification:

February 22, 2021

Camera-ready:

March 1, 2021

ORGANIZATION

team

Panos K. Chrysanthis
GENERAL CHAIR

Professor, Computer Science Department
University of Pittsburgh
team

Meichun Hsu
GENERAL CHAIR

Sr. Director of R&D Database Server Technology
Oracle Corporation
team

Herodotos Herodotou
PROGRAM CHAIR

Assistant Professor, Dept. of Electrical Eng., Computer Eng. and Informatics
Cyprus University of Technology
team

Yingjun Wu
PROGRAM CHAIR

Founder and CEO,
Singularity Data Inc
team

Constantinos Costa
VICE PROGRAM CHAIR

Research Associate, Computer Science Department
University of Pittsburgh



PROGRAM COMITTEE

  • Alkis Simitsis, Athena Research Center, Greece
  • Andreas Kipf, Massachusetts Institute of Technology, USA
  • Bailu Ding, Microsoft Research, USA
  • Deepak Majeti, Vertica/MicroFocus, USA
  • Eduardo Cunha de Almeida, Federal University of Paraná, Brazil
  • Evaggelia Pitoura, U. Ioannina, Greece
  • George Pallis, University of Cyprus, Cyprus
  • Guoliang Li, Tsinghua University, China
  • Jiaheng Lu, University of Helsinki, Finland
  • Kai-Uwe Sattler, TU Ilmenau, Germany
  • Ken Salem, University of Waterloo, Canada
  • Khuzaima Daudjee, University of Waterloo, Canada
  • Le Gruenwald, University of Oklahoma, USA
  • Matthias J Sax, Confluent Inc., USA
  • Mohamed A Sharaf, United Arab Emirates University, UAE
  • Nesime Tatbul, Intel Labs and MIT, USA
  • Nikos Katsipoulakis, Amazon Web Services, USA
  • Peter Triantafillou, University of Warwick, UK
  • Rebecca Taft, Cockroach Labs, USA
  • Ryan Marcus, MIT, USA
  • Uta Störl, Darmstadt University of Applied Sciences, Germany
  • Vivek Narasayya, Microsoft Research, USA
  • Yao Lu, Microsoft Research, USA



COMMUNICATION CHAIR / WEBMASTER

Constantinos Costa, University of Pittsburgh, USA



STUDENT VOLUNTEER

Brian Nixon, University of Pittsburgh, USA


Rakan Alseghayer, University of Pittsburgh, USA


PROGRAM

Times are displayed in PDT and UTC. Look up your local times: https://time.is/.

PDT

UTC

Session Chair

 

7:00
-
8:00

14:00
-
15:00

Herodotos Herodotou

Session 1

Opening and Introductions

General Chairs: Panos K. Chrysanthis & Meichun Hsu

10 min

Research Talk 1

Performance Models of Data Parallel DAG Workflows for Large Scale Data Analytics

Juwei Shi (Microsoft)*; Jiaheng  Lu  (University of Helsinki)

25 min

Research Talk 2

Adaptive Query Compilation in Graph Databases

Alexander Baumstark (TU Ilmenau)*; Muhammad Attahir Jibril (TU Ilmenau); Kai-Uwe Sattler (TU Ilmenau)

25 min

8:00
-
8:10

15:00
-
15:10

Break

8:10
-
9:40

15:10
-
16:40

Yingjun
Wu

Session 2

Keynote 1

AI's Enormous Potential for Database Simplification

Sam Lightstone, CTO AI Strategy, IBM Data and AI

45 min

Keynote 2

OtterTune: An Automatic Database Configuration Tuning Service

Andy Pavlo, Associate Professor at Carnegie Mellon University, co-founder of OtterTune

45 min

9:40
-
9:50

16:40
-
16:50

Break

9:50
-
11:20

16:50
-
18:20

Stefan Manegold

Session 3

Keynote 3

Automatic Data Management and Storage Tiering with Oracle Database In-Memory

Shasank Chavan, VP of Data and In-Memory Database Technologies at Oracle

45 min

Keynote 4

Architectural evolution of Amazon Redshift and its practical usage of Machine Learning.

Ippokratis Pandis, Senior Principal Engineer at Amazon Web Services

45 min

11:20
-
11:30

18:20
-
18:30

Break

11:30
-
12:30

18:30
-
19:30

Constantinos Costa

Session 4

Research Talk 3

Improving Stream Load Balance through Shedding

Nikos Katsipoulakis (Amazon Web Services)*; Alexandros Labrinidis (University of Pittsburgh); Panos Chrysanthis (University of Pittsburgh)

25 min

Research Talk 4

Towards a Benchmark for Learned Systems

Laurent Bindschaedler (MIT)*; Andreas Kipf (MIT); Tim Kraska (MIT); Ryan Marcus (MIT); Umar Farooq Minhas (Microsoft Research)

25 min

Closing

General Chairs: Panos K. Chrysanthis & Meichun Hsu

10 min



JOINT KEYNOTE TALKS WITH HARDDB & ACTIVE 2021


AI's Enormous Potential for Database Simplification

Sam Lightstone, CTO AI Strategy, IBM Data and AI


ABSTRACT

Research into self-managing databases exploded in the early 2000's with sizeable corporate efforts from each of IBM, Microsoft and Oracle. In 2005 the SMDB Workgroup was founded by Sam Lightstone and Guy Lohman to bring together like minded innovators from industry and academia. Now, as we enter the era of intelligent computing, AI offers itself as a catalyst for quantum gains in database simplification. In this session Sam Lightstone will contrast the state of SMDB technology of 2005 with today's emerging new potential for automation, semantic simplification and handling new workloads.

ABOUT THE SPEAKER

sam Sam Lightstone is IBM Chief Technology Officer for AI, IBM Fellow and a Master Inventor in the IBM Data and AI group. He is also chair of the Data and AI Technical Team, the working group of IBM’s technical executives in the division. He has been the founder and co-founder of several large-scale initiatives including AI databases, next generation data warehousing, data virtualization, autonomic computing for data systems, serverless cloud SQL query, and cloud native database services. He co-founded the IEEE Data Engineering Workgroup on Self-Managing Database Systems. Sam has more than 65 patents issued and pending and has authored 4 books and over 30 papers. Sam’s books have been translated into Chinese, Japanese and Spanish. In his spare time he is an avid guitar player and fencer. His Twitter handle is "samlightstone".


OtterTune: An Automatic Database Configuration Tuning Service

Andy Pavlo, Associate Professor, Computer Science Department, Carnegie Mellon University & Co-founder, OtterTune


ABSTRACT

Database management systems (DBMS) expose dozens of configurable knobs that control their runtime behavior. Setting these knobs correctly for an application's workload can improve the performance and efficiency of the DBMS. But such tuning requires considerable efforts from experienced administrators, which is not scalable for large DBMS fleets. This problem has led to research on using machine learning (ML) to devise strategies to optimize DBMS knobs for any application automatically. The OtterTune database tuning service from Carnegie Mellon uses ML to generate and install optimized DBMS configurations. OtterTune observes the DBMS's workload through its metrics and then trains recommendation models that select better knob values. It then reuses these models to tune other DBMSs more quickly. In this talk, I will present an overview of OtterTune and discuss the challenges one must overcome to deploy an ML-based service for DBMSs. I will also highlight the insights we learned from real-world installations of OtterTune.

ABOUT THE SPEAKER

Andy

Andy Pavlo is an Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. His research interest is in database management systems, specifically main memory systems, self-driving / autonomous architectures, transaction processing systems, and large-scale data analytics. At CMU, he is a member of the Database Group and the Parallel Data Laboratory. He is also the co-founder and CEO of OtterTune


Automatic Data Management and Storage Tiering with Oracle Database In-Memory

Shasank Chavan, Vice President of the Data and In-Memory Technologies group, Oracle


ABSTRACT

Autonomous / Self-Driving Databases utilize machine learning techniques to eliminate the manual labor associated with database tuning, security, backups, updates, and other routine management tasks traditionally performed by DBAs. This talk will focus specifically on how we implement a self-driving database with Oracle’s Database In-Memory product to automatically tune for query optimization, memory management, storage management and data tiering. We will first present Oracle’s Database In-Memory architecture and various features built for optimizing analytics and mixed workload performance, and then describe in some detail the smarts we have to make it auto-performing in our self-driving database.

ABOUT THE SPEAKER

Shasank Shasank Chavan is the Vice President of the Data and In-Memory Technologies group at Oracle. He leads an amazing team of brilliant engineers in the Database organization who develop customer-facing, performance-critical features for an In-Memory Columnar Store which, as Larry Ellison proclaimed, “processes data at ungodly speeds”. His team implements novel SIMD kernels and hardware acceleration technology for blazing fast columnar data processing, optimized data formats and compression technology for efficient in-memory storage, algorithms and techniques for fast in-memory join and aggregation processing, and optimized in-memory data access and storage solutions in general. His team is currently hyper-focused on leveraging emerging hardware technologies to build Oracle's next-generation, highly distributed, data storage engine that powers the cloud. Shasank earned his BS/MS in Computer Science at the University of California, San Diego. He has accumulated 20+ patents over a span of 22 years working on systems software technology.

Architectural evolution of Amazon Redshift and its practical usage of Machine Learning

Ippokratis Pandis, Senior Principal Engineer at Amazon Web Services


ABSTRACT

Amazon Redshift is Amazon's Petabyte-scale managed cloud data warehouse. Every day customers use Amazon Redshift to process multiple Exabytes of data. In the first part of this talk, we are going to look a bit under the hood of Amazon Redshift and discuss how the team makes sure that Amazon Redshift maintains its price/performance leadership among Cloud Data Warehouses. Further, we will talk about its architectural evolution, discussing features such as Managed Storage, Elastic Resize, Concurrency Scaling, DataSharing, Spectrum and AQUA. In the second part, we are going to discuss how Amazon Redshift leverages Machine Learning to improve its global operation, to reduce the need for administrative operations by its customers, and to improve its performance.

ABOUT THE SPEAKER

Ippokratis Ippokratis Pandis is a senior principal engineer at Amazon Web Services, working in Amazon Redshift. Redshift is Amazon's fully managed, petabyte-scale data warehouse service. Among others, Ippokratis is the architect of the Spectrum, Concurrency Scaling and DataSharing features of Redshift. Previously, Ippokratis has held positions as software engineer at Cloudera where he worked on the Impala SQL-on-Hadoop query engine, and as member of the research staff at the IBM Almaden Research Center, where he worked on the DB2 BLU product. Ippokratis received his PhD from the Electrical and Computer Engineering department at Carnegie Mellon University. He is the recipient of Best Demonstration awards at ICDE 2006 and SIGMOD 2011, and Test-of-Time award at EDBT 2019. He has served as PC chair of DaMoN 2014, DaMoN 2015, CloudDM 2016 and HPTS 2019.



VENUE