Advertisement · 728 × 90

Posts by Distributed Systems

Post image

Hypothesis, Antithesis, synthesis
antithesis.com/blog/2026/he...
Hegel is an attempt to bring the quality of property-based testing found in Hypothesis to every language.

1 day ago 0 0 0 0
Post image

Testing for DR Failover Testing
www.usenix.org/conference/s...
A simplified scenario is recovery from a full data centre failure. The Zendesk Chat backend infrastructure operates in a single data centre. The way to be sure that DR works is to perform a real failover.

3 days ago 0 0 0 0
Post image

Top Five Scalability Patterns
www.f5.com/company/blog...

1 week ago 1 0 0 0
Post image

Building A Billion User Load Balancer
www.usenix.org/conference/s...

2 weeks ago 0 0 0 0
Post image

What Is Coordination, Really?
jhellerstein.github.io/blog/cost-of...

2 weeks ago 2 0 0 0
Post image

A Pretty-printer for TLA+
blog.fponzi.me/2026-03-30-a...

3 weeks ago 0 0 0 0
Post image

Distributed rate limiting of delivery attempts
blog.allegro.tech/2017/04/herm...

3 weeks ago 0 0 0 0
Post image

Smarter Auto-Scaling for ClickHouse: The Two-Window Approach
clickhouse.com/blog/smarter...

3 weeks ago 0 0 0 0
Advertisement
Post image

The absolute beginners guide to databasemaxxing
pthorpe92.dev/databasemaxx...

4 weeks ago 0 0 0 0
Post image

RocksDB development finds a CPU bug
rocksdb.org/blog/2026/02...

4 weeks ago 0 0 0 0
Post image

Three Lenses on Coordination
jhellerstein.github.io/blog/three-s...

1 month ago 1 0 0 0
Post image

Consensus Board Game
matklad.github.io/2026/03/19/c...

1 month ago 0 1 0 0
Post image

the mathematics of compression in database systems
www.bitsxpages.com/p/the-mathem...
I started thinking about compression when implementing prefix compression for SlateDB. When I ran benchmarks, I noticed that performance seemed "worse" despite improved compression ratios.

1 month ago 1 0 0 0
Post image

Formal Methods Beyond Correctness: Isolation & Permissiveness of Distributed Transactions in MongoDB
www.mongodb.com/company/blog...

2 months ago 1 0 0 0
Post image

One-off Verified Transpilation with Claude
will62794.github.io/verification...
We can automatically check correctness properties of a TLA+ specification using TLC, a model checker that will exhaustively explore a spec’s reachable states and check that some specified property holds.

2 months ago 0 0 0 0
Post image

Scaling PostgreSQL to power 800 million ChatGPT users
openai.com/index/scalin...

2 months ago 0 0 0 0
Post image

SpiceDB is an open-source, Google Zanzibar -inspired database system for real-time, security-critical application permissions.
authzed.com/docs/spicedb...

3 months ago 0 0 0 0
Post image

On Idempotency Keys
www.morling.dev/blog/on-idem...
In distributed systems, there’s a common understanding that it is not possible to guarantee exactly-once delivery of messages. What is possible though is exactly-once processing.

3 months ago 1 0 0 0
Advertisement
Post image

What Does Write Skew Look Like?
justinjaffray.com/what-does-wr...
This post is about gaining intuition for Write Skew, and, by extension, Snapshot Isolation. Snapshot Isolation is billed as a transaction isolation level that offers a good mix between performance and correctness.

4 months ago 0 0 0 0
Post image

How to do distributed locking
martin.kleppmann.com/2016/02/08/h...
Redis has been making inroads into areas of data management where there are stronger consistency and durability expectations. Distributed locking is one of those areas. Let’s examine it in some more detail.

4 months ago 0 0 0 0
Preview
Reproducing the AWS Outage Race Condition with a Model Checker | Waqas Younas' blog Welcome to Waqas' blog

Reproducing the AWS Outage Race Condition with a Model Checker
wyounas.github.io/aws/concurre...
We’ll use a model checker to see how such a race could happen. Formal verification can’t prevent every failure, but it helps us think more clearly about correctness and reason about subtle bugs.

5 months ago 0 0 0 0
Preview
TLA+ Modeling of AWS outage DNS race condition On Oct 19–20, 2025, AWS’s N. Virginia region suffered a major DynamoDB outage triggered by a DNS automation defect that broke endpoint resol...

TLA+ Modeling of AWS outage DNS race condition
muratbuffalo.blogspot.com/2025/11/tla-...
AWS’s N. Virginia region suffered a DynamoDB outage triggered by a DNS automation defect.This post focuses narrowly on the race condition at the core of the bug, which is best understood through TLA+ modeling

5 months ago 0 0 0 0
Post image

TernFS — an exabyte scale, multi-region distributed filesystem
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.

5 months ago 0 0 0 0
Preview
Just make it scale: An Aurora DSQL story AWS Senior Principal Engineers, Niko Matsakis and Marc Bowes, take us inside Aurora DSQL's development: scaling write operations without two-phase commit, overcoming garbage collection hurdles, and…

Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
Each component follows the Unix mantra—do one thing, and do it well—but working together they are able to offer all the features users expect from a database.

5 months ago 0 0 0 0
Preview
Aurora DSQL: How authentication and authorization works In this article, I’m going to explain how connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL…

Aurora DSQL: How authentication and authorization works
marc-bowes.com/dsql-auth.html
How connections to Aurora DSQL are authenticated and authorized. This information is meant to be supplemental to what is found in the official Amazon Aurora DSQL documentation.

6 months ago 1 0 0 0
Preview
Dynamo, DynamoDB, and Aurora DSQL - Marc's Blog Names are hard, ok?

Dynamo, DynamoDB, and Aurora DSQL
brooker.co.za/blog/2025/08...
People often ask me about the architectural relationship between Amazon Dynamo, Amazon DynamoDB and Aurora DSQL. I’ll start off on comparing how the systems achieve a few key properties.

6 months ago 0 0 0 0
Advertisement
Post image

Linearizability testing S2 with deterministic simulation
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.

6 months ago 1 0 0 0
Post image

How I solved a distributed queue problem after 15 years
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.

7 months ago 0 0 0 0
Post image

Understanding Paxos the intuitive way
relentless-leader.com/dive-deep-in...

8 months ago 0 0 0 0
Post image

Murat Demirbas and Aleksey Charapko read and discuss the HotOS paper""Real Life Is Uncertain. Consensus Should Be Too!"

8 months ago 0 0 0 0