Checking Crash Consistency on Storage Systems

发布者:科技处 发布时间:2018-06-08

讲座人:Feng Qin
时间地点:6月12日(周二 9点30分)信息楼503会议室

Abstract:
Crash consistency is an important property for modern storage systems. Unfortunately, it is difficult to be preserved in the complex storage stack. At the higher layer of the stack, applications enables users to perform various functions and to accomplish different tasks. Furthermore, the middle layers including databases, file systems, block layers introduce all kinds of optimizations such as caching and journaling. At the lower layer of the stack, the new components such as Solid State Drive (SSD) are often ignored or under-studied in the adverse conditions. Such complexity brings unprecedented challenges to preserve the data consistency of storage systems after crashes.

In this talk, I will mainly present our recent work on checking crash consistency of databases and applications. In particular, our framework for torturing databases include carefully-crafted workloads to exercise the ACID guarantees, a record/replay subsystem to allow the controlled injection of simulated power faults, a ranking algorithm to prioritize where to fault based on our experience, and a multi-layer tracer to diagnose root causes. Unlike databases, applications provide various functions to users, requiring non-trivial manual efforts of specifying checking scripts and workloads. To address this key challenge, our proposed approach C3 automatically generates the testing oracle and checking scripts to make the entire validation process as easy as pressing a single button.

  

Feng Qin received his Ph.D. degree from the University of Illinois at Urbana-Champaign. He joined the Department of Computer Science and Engineering at Ohio State as an Assistant Professor in 2006 and was promoted to an Associate Professor with tenure in 2013. His research interests include Software Reliability, Operating Systems, High Performance Computing, and Security. He is particularly interested in developing system mechanisms to improve software availability and reliability at different software development stages. He has published papers in top system conferences in the past decade. One of his papers was awarded as best papers in SOSP'05. Two of his papers won IEEE Micro Top Picks in 2004 and 2007, respectively. Three of his papers were nominated as best papers in HPCA'05, SC'07, and SC'10, respectively. He has received NSF CAREER Award in 2010, OSU Lumley Research Award in 2015, and CSE Teaching Award in 2018.