Advertisement · 728 × 90

Posts by Abhay Bothra

King to c7?

1 year ago 2 0 0 0

Caveat: Some of these could be unique to Fennel’s architecture because of our reliance on Kafka for exactly-once semantics and recovery

1 year ago 0 0 0 0

Why use large batches at all? To amortize the cost of Kafka transactions, which we rely on for exactly-once semantics.

1 year ago 0 0 1 0

The latter also keeps memory utilization proportional to mini-batch size.

1 year ago 0 0 1 0

We got around that by internally sharding each batch of records and processing sub-shards in parallel.
We also break down our batches into mini-batches so output of the chain can be streamed to Kafka without waiting for the full batch execution to finish.

1 year ago 0 0 1 0

Cons: This architecture prevents concurrent/fully async operation of all operators since now each batch has to be processed in full by the operator chain before moving to the next batch, which was in turn preventing us from running full throttle even when CPU capacity was available.

1 year ago 0 0 1 0

Great thread from @micahw.com. Adding some of our own learnings from building this in Fennel.

An additional advantage for us was that it allowed us to keep data in columnar format for longer instead of converting back-and-forth between operators for serialization.

1 year ago 3 0 1 0
Advertisement

In hindsight, what would the right API for this look like?

1 year ago 1 0 1 0

Yes, I think they do this so that the ‘a’ region doesn’t become a hotspot. Was definitely surprising when I found out, but ultimately made sense.

1 year ago 3 0 0 0
Preview
Control Planes and the Death of the Cluster The recent (few years) rediscovery of control plane architecture, largely due to the success of Kubernetes, is changing the way people think about distributed systems. Not many years ago, everything "...

Clusters are getting squeezed from above by smarter control planes, and from below by cheap and consistent object storage.

www.linkedin.com/pulse/contro...

1 year ago 21 5 2 0

it occupies a very interesting point in the design space of caches, but the fact that you can’t immediately read your writes can be a problem that you still need to design for. I wonder if that is its undoing.
@jonhoo.eu might have more thoughts on this.

1 year ago 1 0 1 0

That was their implementation of Noria?

1 year ago 1 0 1 0

We’ve built an IVM engine at Fennel that allows python UDFs by leveraging a fleet of python workers for execution while keeping the other operators in Rust. Hope to write a lot more about the technical details soon. One problem that we’ve had to solve is to provide IVM with time travel.

1 year ago 3 0 0 0
Post image

TIL AWS un-launched S3 Select[1] as of July 25, 2024, presumably in favor of S3 Object Lambda[2]. RIP PushdownDB (arxiv.org/abs/2002.0...).

[1]: aws.amazon.com/blogs...
[2]: aws.amazon.com/s3/fe...

1 year ago 10 2 3 0
Advertisement