• Slide 1
  • Slide 2
  • Slide 3


During the last forty years, data management systems have grown in scale, complexity, and number of installations, while the workloads they serve have become more diverse and demanding. Current trends like cloud computing make this situation even more challenging for service providers who have to configure and manage thousands of database nodes as well as to ensure that service level agreements are met.
There has been a significant amount of research addressing these issues by providing autonomic or self-* features in database systems to support complex administrative tasks, such as physical database design, problem diagnosis, and performance tuning, as well as to optimize the operations of database components such as the query optimizer and the execution engine. However, new challenges arise from trends like cloud and cluster computing, virtualization, and Software-as-a-Service (SaaS). A major challenge is the need to scale self-management capabilities to the level of hundreds to thousands of nodes while considering economic factors.
Autonomic, or self-managing, systems are a promising approach to achieve the goal of systems that are easier to use and maintain. A system is considered autonomic if it possesses the capabilities to be self-configuring, self-optimizing, self-healing and self-protecting. The aim of the SMDB workshop is to provide a forum for researchers from both industry and academia to present and discuss ideas related to self-management and self-organization in data management systems ranging from classical databases to data stream engines to large-scale cloud environments that utilize advanced AI, machine learning, and data mining and analysis.
We plan to follow the successful format of previous instances of this workshop: approximately eight presentations of accepted papers, a keynote address by a well-known speaker and subject matter expert in self-managing database systems, as well as a panel discussion involving experts from industry and academia. The last two years, SMDB also featured joint keynote addresses with the Joint International Workshop on Big Data Management on Emerging Hardware and Data Management on Virtualized Active Systems (HardBD&Active). Furthermore, in previous years, the best papers presented in SMDB and HardBD&Active were invited for extended submissions to a Special Issue in DAPD (Distributed and Parallel Databases) Journal under the theme “Self-Managing and Hardware-Optimized Database Systems.”


Download CFP


Topics of interest include, but are not limited to:

* Principles and architecture of autonomic data management systems
* Retro-fitting existing systems vs. designing for self management
* Self-* capabilities in databases and storage systems
* Data management in cloud and multi-tenant databases
* Autonomic capabilities in database-as-a-service platforms
* Automated testing of data management systems
* Automated physical database design and adaptive query tuning
* Automated provisioning and integration
* Automatic enforcement of information quality
* Robust query processing techniques
* Self-managing database components (e.g., query optimizer, execution engine)
* Self-managing data stream engines and adaptive event-based systems
* Self-managing distributed / decentralized / peer-to-peer information systems
* Self-management of internet-scale distributed systems
* Self-management for big data infrastructures
* Monitoring and diagnostics in data management systems
* Policy automation and visualization for datacenter administration
* User acceptance and trust of autonomic capabilities
* Evaluation criteria and benchmarks for self-managing systems
* Self-evaluation of data management services in the cloud
* Use cases and war stories on deploying autonomic capabilities


Authors are invited to submit original research contributions in English of up to 6 pages in the IEEE camera-ready format (templates are available at the ICDE 2022 submission guidelines page) to the submission site https://cmt3.research.microsoft.com/SMDB2022. Authors of accepted papers will be encouraged to submit an extended paper of up to 8 pages for final publication. Author are also invited to submit short papers up to 4 pages. The page limit includes the bibliography and any appendix. All accepted papers will appear in the formal Proceedings of the Conference Workshops published by IEEE CS Press, and will be included in the IEEE digital library.

Authors of a selection of accepted papers will be invited to submit an extended version to the Distributed and Parallel Databases (DAPD) journal.


Paper submission deadlines:

January 09, 2022 (Sunday) 5pm PST
February 15, 2022 (Tuesday) 5pm PST (abstract) (optional)

Paper Submission: January 23, 2022 (Sunday) 5pm PST
February 15, 2022 (Tuesday) 5pm PST


March 1, 2022 (Tuesday) March 6, 2022 (Sunday)


March 25, 2022 (Friday)



Herodotos Herodotou

Assistant Professor, Dept. of Electrical Eng., Computer Eng. and Informatics
Cyprus University of Technology

Yingjun Wu

Founder and CEO,
Singularity Data Inc

Constantinos Costa

Visiting Research Assistant Professor, Computer Science Department
University of Pittsburgh

Bailu Ding

Principal Researcher, Microsoft Research
Redmond, USA

Demetris Trihinas

Lecturer, Computer Science Department
University of Nicosia



Panos K. Chrysanthis

Professor, Computer Science Department
University of Pittsburgh

Meichun Hsu

Sr. Director of R&D Database Server Technology
Oracle Corporation


  • Alkis Simitsis, Athena Research Center, Greece
  • Andreas Kipf, MIT, USA
  • Anshuman Dutt, Microsoft Research, USA
  • Bo Tang, Southern University of Science and Technology, China
  • Danica Porobic, Oracle, USA
  • Deepak Majeti, Ahana, USA
  • Eduardo Cunha de Almeida, Federal University of Paraná, Brazil
  • George Pallis, University of Cyprus, Cyprus
  • Guoliang Li, Tsinghua University, China
  • Jeff LeFevre, UCSC, USA
  • John Paparrizos, University of Chicago, USA
  • Kai-Uwe Sattler, TU Ilmenau, Germany
  • Ken Salem, University of Waterloo, Canada
  • Khuzaima Daudjee, University of Waterloo, Canada
  • Le Gruenwald, University of Oklahoma, USA
  • Lin Ma, CMU, USA
  • Matthias J. Sax, Confluent Inc., USA
  • Meike Klettke, University of Rostock, Germany
  • Mohamed A. Sharaf, United Arab Emirates University, UAE
  • Nikos Katsipoulakis, Snowflake, USA
  • Peter Triantafillou, University of Warwick, UK
  • Ryan Marcus, MIT, USA
  • Tarique Siddiqui, MSR, USA
  • Uta Störl, University of Hagen, Germany


Xiaozhong Zhang, University of Pittsburgh, USA


Brian Nixon, University of Pittsburgh, USA

Rakan Alseghayer, University of Pittsburgh, USA


Accepted Papers

  • Smarter Warehouse , Nikolay Laptev (Meta), Wenbo Tao (META), Caner Komurlu (META), Jason Xu (META), Deke Sun (META), Thomas Lux (META), Luo Mi (META)
  • Exploring System and Machine Learning Performance Interactions when Tuning Distributed Data Stream Applications , Lambros Odysseos (Cyprus University of Technology, Cyprus), Herodotos Herodotou (Cyprus University of Technology, Cyprus)
  • AlphaSQL: Open Source Software Tool for Automatic Dependency Resolution, Parallelization and Validation for SQL and Data , Masahiro Matsui (University of Tokyo & Japan Data Science Consortium Co. Ltd, Japan), Takuto Sugisaki (University of Tokyo, Japan & Japan Data Science Consortium Co. Ltd), Kensaku Okada (Japan Data Science Consortium Co. Ltd) and Noboru Koshizuka (University of Tokyo, Japan)
  • Adaptive Update Handling for Graph HTAP , Muhammad Attahir Jibril (TU Ilmenau, Germany), Alexander Baumstark (TU Ilmenau, Germany), Kai-Uwe Sattler (TU Ilmenau, Germany)
  • Anatomy of Learned Database Tuning with Bayesian Optimization , George-Octavian Barbulescu (University of Warwick, UK), Peter Triantafillou (University of Warwick, UK)
  • Data placement in dynamic fog ecosystems , Theodoros Toliopoulos (Aristotle University of Thessaloniki, Greece), Anna-Valentini Michailidou (Aristotle University of Thessaloniki, Greece), Anastasios Gounaris (Aristotle University of Thessaloniki, Greece)
Times are displayed in MYT. Look up your local times: https://time.is/.
Time (Malaysia) Session Chairs
8:00-8:10 Herodotos Herodotou &
Yingjun Wu

Opening and Introductions

General Chairs: Herodotos Herodotou &
Yingjun Wu
8:10-9:10 Constantinos Costa Research Talk 1 Smarter Warehouse Nikolay Laptev (Facebook)
Research Talk 2 Anatomy of Learned Database Tuning with Bayesian Optimisation George O Barbulescu (University of Warwick), Peter Triantafillou (University of Warwick)
Research Talk 3 Adaptive Update Handling for Graph HTAP Muhammad Attahir Jibril (TU Ilmenau), Alexander Baumstark (TU Ilmenau), Kai-Uwe Sattler (TU Ilmenau)


9:20-11:00 Herodotos Herodotou &
Yingjun Wu

Keynote 1

Deep Data Integration Wang-Chiew Tan, Research Scientist, Facebook AI

Keynote 2

Towards instance-optimized data systems Tim Kraska, Associate Professor, MIT


11:10-12:00 Panos K. Chrysanthis

Founders & Pioneers Keynote Talk

Modern Cloud DBMSs Vindicate Age-Old Work on Shared Disks DBMSs! C. Mohan

Long Break

14:00-15:40 Ilia Petrov

Keynote 3

Accelerating Data Analytics in the Era of Ubiquitous Computing: Opportunities and Challenges Maya Gokhale
Distinguished Member of Technical Staff, Lawrence Livermore National Laboratory, USA

Keynote 4

Memory-Centric Computing Onur Mutlu, Professor of Computer Science, ETH Zurich


15:50-16:50 Bailu Ding Research Talk 4 Exploring System and Machine Learning Performance Interactions when Tuning Distributed Data Stream Applications Lambros Odysseos (Cyprus University of Technology), Herodotos Herodotou (Cyprus University of Technology)
Research Talk 5 Data placement in dynamic fog ecosystems Anna-Valentini Michailidou (Aristotle University of Thessaloniki), Theodoros Toliopoulos (Aristotle University of Thessaloniki), Anastasios Gounaris (Aristotle University of Thessaloniki)
Research Talk 6 AlphaSQL: Open Source Software Tool for Automatic Dependency Resolution, Parallelization and Validation for SQL and Data Masahiro Matsui (The University of Tokyo), Takuto Sugisaki (The University of Tokyo), Kensaku Okada (Japan Data Science Consortium), Noboru Koshizuka (The University of Tokyo)
16:50-17:00 Herodotos Herodotou &
Yingjun Wu


General Chairs: Herodotos Herodotou &

Yingjun Wu



Deep Data Integration

Wang-Chiew Tan, Research Scientist, Facebook AI


We are witnessing the widespread adoption of deep learning techniques as avant-garde solutions to different computational problems in recent years. In data integration, the use of deep learning techniques has helped establish several state-of-the-art results in long standing problems, including information extraction, entity matching, data cleaning, and table understanding. In this talk, I will reflect on the strengths of deep learning and how that has helped move forward the needle in data integration. I will also discuss a few challenges associated with solutions based on deep learning techniques and describe some opportunities for the future work.


Wang-Chiew Tan

Wang-Chiew is a research scientist manager at Meta AI. Before she was the Head of Research at Megagon Labs, where she led the research efforts on building advanced technologies to enhance search by experience. This included research on data integration, information extraction, text mining and summarization. Prior to joining Megagon Labs, she was a Professor of Computer Science at University of California, Santa Cruz. She also spent two years at IBM Research-Almaden.

Towards instance-optimized data systems

Tim Kraska, Associate Professor, MIT


Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithms and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other data management tasks. Arguably, the ideas behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, what these techniques will allow us to build are “instance-optimized” systems; systems that self-adjust to a given workload and data distribution to provide unprecedented performance and avoid the need for tuning by an administrator. In this talk, I will first provide an overview of the opportunities and limitations of current ML-enhanced algorithms and data structures, present initial results of SageDB, a first instance-optimized system we are building as part of DSAIL@CSAIL at MIT, and finally outline remaining challenges and future directions.


Tim Kraska

Tim Kraska is an Associate Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory, co-director of the Data System and AI Lab at MIT (DSAIL@CSAIL), and co-founder of Einblick Analytics. Currently, his research focuses on building systems for machine learning, and using machine learning for systems. Before joining MIT, Tim was an Assistant Professor at Brown, spent time at Google Brain, and was a PostDoc in the AMPLab at UC Berkeley after he got his PhD from ETH Zurich. Tim is a 2017 Alfred P. Sloan Research Fellow in computer science and received several awards including the VLDB Early Career Research Contribution Award, the VMware Systems Research Award, the university-wide Early Career Research Achievement Award at Brown University, an NSF CAREER Award, as well as several best paper and demo awards at VLDB, SIGMOD, and ICDE.


Modern Cloud DBMSs Vindicate Age-Old Work on Shared Disks DBMSs!

C. Mohan, Distinguished Visiting Professor, Tsinghua University, China


Over 3 decades ago, when the database research community was enamored of shared nothing database management systems (DBMSs), some of us were focused on DBMSs which were based on the shared disks (SD) architecture. While my own work involved IBM’s DB2 on the mainframe, earlier SD product work had been done by DEC, IBM (with IMS), Oracle and a couple of Japanese vendors. The research community didn’t appreciate that much our SD work even though IBM and Oracle have been quite successful with their SD relational DBMS products. With the emergence of the public cloud, many classical on-premises DBMSs have been ported to the cloud arena. New DBMSs have also been developed from scratch to work in the cloud environment. One of the dominant characteristics of the cloud DBMSs is that they are embracing the SD architecture because of the architectural separation of compute nodes and storage nodes (also called disaggregated storage) in the cloud environment to gain several advantages. I feel that these recent developments vindicate our age-old SD work! In this talk, I will first introduce traditional (non-cloud) parallel and distributed database systems. I will cover concepts like SQL and NoSQL systems, data replication, distributed and parallel query processing, and data recovery after different types of failures. Then, I will discuss how the emergence of the (public) cloud has introduced new requirements on parallel and distributed database systems, and how such requirements have necessitated fundamental changes to the architectures of such systems which includes embracing at least some of the SD ideas. I will illustrate the related developments by discussing the details of several cloud DBMSs.


Onur Mutlu

Dr. C. Mohan is currently a Distinguished Visiting Professor at Tsinghua University in China, a Visiting Researcher at Google, a Member of the inaugural Board of Governors of Digital University Kerala, and an Advisor of the Kerala Blockchain Academy (KBA) and the Tamil Nadu e-Governance Agency (TNeGA) in India. He retired in June 2020 from being an IBM Fellow at the IBM Almaden Research Center in Silicon Valley. He was an IBM researcher for 38.5 years in the database, blockchain, AI and related areas, impacting numerous IBM and non-IBM products, the research and academic communities, and standards, especially with his invention of the well-known ARIES family of database locking and recovery algorithms, and the Presumed Abort distributed commit protocol. This IBM (1997-2020), ACM (2002-) and IEEE (2002-) Fellow has also served as the IBM India Chief Scientist (2006-2009). In addition to receiving the ACM SIGMOD Edgar F. Codd Innovations Award (1996), the VLDB 10 Year Best Paper Award (1999) and numerous IBM awards, Mohan was elected to the United States and Indian National Academies of Engineering (2009), and named an IBM Master Inventor (1997). This Distinguished Alumnus of IIT Madras (1977) received his PhD at the University of Texas at Austin (1981). He is an inventor of 50 patents. During the last many years, he focused on Blockchain, AI, Big Data and Cloud technologies (https://bit.ly/sigBcP, https://bit.ly/CMoTalks). Since 2017, he has been an evangelist of permissioned blockchains and the myth buster of permissionless blockchains. During 1H2021, Mohan was the Shaw Visiting Professor at the National University of Singapore (NUS) where he taught a seminar course on distributed data and computing. In 2019, he became an Honorary Advisor to TNeGA for its blockchain and other projects. In 2020, he joined the Advisory Board of KBA. Since 2016, Mohan has been a Distinguished Visiting Professor of China’s prestigious Tsinghua University. In 2021, he was inducted as a member of the inaugural Board of Governors of the new Indian university Digital University Kerala (DUK). Mohan has served on the advisory board of IEEE Spectrum, and on numerous conference and journal boards. In 2022, he became a consultant at Google with the title of Visiting Researcher. He has also been a Consultant to the Microsoft Data Team. Mohan is a frequent speaker in North America, Europe and Asia. He has given talks in 43 countries. He is highly active on social media and has a huge network of followers. More information can be found in the Wikipedia page at https://bit.ly/CMwIkP and his homepage at https://bit.ly/CMoDUK

Accelerating Data Analytics in the Era of Ubiquitous Computing: Opportunities and Challenges

Maya Gokhale
Distinguished Member of Technical Staff, Lawrence Livermore National Laboratory, USA


With innovations in storage and memory capacity combined with the profusion of acceleration architectures, opportunities abound to gain insight from exponentially increasing data sources. Compute functions can be distributed among sensors, in intermediate network aggregation points, in-transit through network routers and host interface, and in/near the data repositories. Efficiently and securely exploiting these emerging opportunities will spur new research, including re-thinking data structures and algorithms, designing domain-specific languages and compiler optimizations, OS and process-level scheduling and resource management, and overriding all, ensuring security and privacy. This talk will discuss the spectrum of opportunities, challenges, and solutions in this domain.


Maya Gokhale

Maya Gokhale is Distinguished Member of Technical Staff at the Lawrence Livermore National Laboratory, USA. Her career spans research conducted in academia, industry, and National Laboratories. Maya received a Ph.D. in Computer Science from University of Pennsylvania. Her current research interests include data intensive heterogeneous architectures and reconfigurable computing. Maya is co-recipient of an R&D 100 award for a C-to-FPGA compiler, co-recipient of four patents related to memory architectures for embedded processors, reconfigurable computing architectures, and cybersecurity, and co-author of more than one hundred forty technical publications. Maya is on the editorial board of the Proceedings of the IEEE and an associate editor of IEEE Micro. She is a co-recipient of the National Intelligence Community Award, is a member of Phi Beta Kappa, and is an IEEE Fellow.

Memory-Centric Computing

Onur Mutlu, Professor of Computer Science, ETH Zurich


Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance, efficiency and scalability are bottlenecked by data movement. In this lecture, we describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data. We argue that an intelligent architecture should be designed to handle data well. We show that handling data well requires designing architectures based on three key principles: 1) data-centric, 2) data-driven, 3) data-aware. We give several examples for how to exploit each of these principles to design a much more efficient and high performance computing system. We especially discuss recent research that aims to fundamentally reduce memory latency and energy, and practically enable computation close to data, with at least two promising novel directions: 1) processing using memory, which exploits analog operational properties of memory chips to perform massively-parallel operations in memory, with low-cost changes, 2) processing near memory, which integrates sophisticated additional processing capability in memory controllers, the logic layer of 3D-stacked memory technologies, or memory chips to enable high memory bandwidth and low memory latency to near-memory logic. We show both types of architectures can enable orders of magnitude improvements in performance and energy consumption of many important workloads, such as graph analytics, database systems, machine learning, video processing. We discuss how to enable adoption of such fundamentally more intelligent architectures, which we believe are key to efficiency, performance, and sustainability. We conclude with some guiding principles for future computing architecture and system designs.
A short accompanying paper, "Intelligent Architectures for Intelligent Computing Systems", which appeared in DATE 2021, can be found here and serves as recommended reading. A longer overview paper, "A Modern Primer on Processing in Memory" is available here.


Onur Mutlu Onur Mutlu is a Professor of Computer Science at ETH Zurich. He is also a faculty member at Carnegie Mellon University, where he previously held the Strecker Early Career Professorship. His current broader research interests are in computer architecture, systems, hardware security, and bioinformatics. A variety of techniques he, along with his group and collaborators, has invented over the years have influenced industry and have been employed in commercial microprocessors and memory/storage systems. He obtained his PhD and MS in ECE from the University of Texas at Austin and BS degrees in Computer Engineering and Psychology from the University of Michigan, Ann Arbor. He started the Computer Architecture Group at Microsoft Research (2006-2009), and held various product and research positions at Intel Corporation, Advanced Micro Devices, VMware, and Google. He received the IEEE High Performance Computer Architecture Test of Time Award, the IEEE Computer Society Edward J. McCluskey Technical Achievement Award, ACM SIGARCH Maurice Wilkes Award, the inaugural IEEE Computer Society Young Computer Architect Award, the inaugural Intel Early Career Faculty Award, US National Science Foundation CAREER Award, Carnegie Mellon University Ladd Research Award, faculty partnership awards from various companies, and a healthy number of best paper or "Top Pick" paper recognitions at various computer systems, architecture, and security venues. He is an ACM Fellow "for contributions to computer architecture research, especially in memory systems", IEEE Fellow for "contributions to computer architecture research and practice", and an elected member of the Academy of Europe (Academia Europaea). His computer architecture and digital logic design course lectures and materials are freely available on Youtube (https://www.youtube.com/OnurMutluLectures), and his research group makes a wide variety of software and hardware artifacts freely available online (https://safari.ethz.ch/). For more information, please see his webpage at https://people.inf.ethz.ch/omutlu/.