2051 shaares
Cloud databases face a fundamental challenge: how to remain available and durable under node failures? Modern cloud databases approach this by separating two concerns that used to be tightly coupled: compute and storage. The database engine becomes stateless, while the write-ahead log gets replicated across multiple nodes to guarantee durability. If a database server dies, another one can pick up exactly where it left off by reading from the replicated log.
Distributed log services are thus at the heart of cloud databases. In this blog post, we will explain some drawbacks of the predominant design for distributed logs to motivate a new elegant design. We will also explain why it is necessary to verify this design with formal methods.