Hypothesis, Antithesis, synthesis
antithesis.com/blog/2026/he...
Hegel is an attempt to bring the quality of property-based testing found in Hypothesis to every language.
Posts by Distributed Systems
Testing for DR Failover Testing
www.usenix.org/conference/s...
A simplified scenario is recovery from a full data centre failure. The Zendesk Chat backend infrastructure operates in a single data centre. The way to be sure that DR works is to perform a real failover.
Top Five Scalability Patterns
www.f5.com/company/blog...
Building A Billion User Load Balancer
www.usenix.org/conference/s...
What Is Coordination, Really?
jhellerstein.github.io/blog/cost-of...
A Pretty-printer for TLA+
blog.fponzi.me/2026-03-30-a...
Distributed rate limiting of delivery attempts
blog.allegro.tech/2017/04/herm...
Smarter Auto-Scaling for ClickHouse: The Two-Window Approach
clickhouse.com/blog/smarter...
The absolute beginners guide to databasemaxxing
pthorpe92.dev/databasemaxx...
RocksDB development finds a CPU bug
rocksdb.org/blog/2026/02...
Three Lenses on Coordination
jhellerstein.github.io/blog/three-s...
Consensus Board Game
matklad.github.io/2026/03/19/c...
the mathematics of compression in database systems
www.bitsxpages.com/p/the-mathem...
I started thinking about compression when implementing prefix compression for SlateDB. When I ran benchmarks, I noticed that performance seemed "worse" despite improved compression ratios.
Formal Methods Beyond Correctness: Isolation & Permissiveness of Distributed Transactions in MongoDB
www.mongodb.com/company/blog...
One-off Verified Transpilation with Claude
will62794.github.io/verification...
We can automatically check correctness properties of a TLA+ specification using TLC, a model checker that will exhaustively explore a spec’s reachable states and check that some specified property holds.
Scaling PostgreSQL to power 800 million ChatGPT users
openai.com/index/scalin...
SpiceDB is an open-source, Google Zanzibar -inspired database system for real-time, security-critical application permissions.
authzed.com/docs/spicedb...
On Idempotency Keys
www.morling.dev/blog/on-idem...
In distributed systems, there’s a common understanding that it is not possible to guarantee exactly-once delivery of messages. What is possible though is exactly-once processing.
What Does Write Skew Look Like?
justinjaffray.com/what-does-wr...
This post is about gaining intuition for Write Skew, and, by extension, Snapshot Isolation. Snapshot Isolation is billed as a transaction isolation level that offers a good mix between performance and correctness.
How to do distributed locking
martin.kleppmann.com/2016/02/08/h...
Redis has been making inroads into areas of data management where there are stronger consistency and durability expectations. Distributed locking is one of those areas. Let’s examine it in some more detail.
Reproducing the AWS Outage Race Condition with a Model Checker
wyounas.github.io/aws/concurre...
We’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle bugs.
TLA+ Modeling of AWS outage DNS race condition
muratbuffalo.blogspot.com/2025/11/tla-...
AWS’s N. Virginia region suffered a DynamoDB outage triggered by a DNS automation defect.This post focuses narrowly on the race condition at the core of the bug, which is best understood through TLA+ modeling
TernFS — an exabyte scale, multi-region distributed filesystem
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
Each component follows the Unix mantra—do one thing, and do it well—but working together they are able to offer all the features users expect from a database.
Aurora DSQL: How authentication and authorization works
marc-bowes.com/dsql-auth.html
How connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL documentation.
Dynamo, DynamoDB, and Aurora DSQL
brooker.co.za/blog/2025/08...
People often ask me about the architectural relationship between Amazon Dynamo, Amazon DynamoDB and Aurora DSQL. I’ll start off on comparing how the systems achieve a few key properties.
Linearizability testing S2 with deterministic simulation
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
How I solved a distributed queue problem after 15 years
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
Understanding Paxos the intuitive way
relentless-leader.com/dive-deep-in...
Murat Demirbas and Aleksey Charapko read and discuss the HotOS paper""Real Life Is Uncertain. Consensus Should Be Too!"