Ankush Jain (@schwifty50) Bsky

ClickHouse + S3 is just BigTable + Colossus?

(I've been writing my thesis over the last week and revisiting earlier work is producing random metadata thoughts in my head)

2 months ago 0 0 0 0

Actually no: LSM SSTables encode (key, offset) to allow value retrieval via one random read. Parquet never clusters its columns. But you could have hybrid layouts (some columns are clustered others are not) if you really wanted to.

2 months ago 0 0 0 0

This isn't a particularly timely insight but there's no difference between an LSM tree level and Parquet!

2 months ago 0 0 1 0

Is RDMA Misunderstood? | Ankush Jain Do we keep doing repeating certain things in academic work on HPC fabrics?

Wrote a hot-takes not-particularly-coherent post collating some RDMA ideas that had been accumulating in my head.

ankushja.in/blog/2025/rd...

6 months ago 2 0 0 0

Will let you know what I find when I get there

6 months ago 1 0 0 0

I'm coming around to the opinion that the harms from algorithmic feeds go beyond any single phenomenon, regardless of scale/impact. Recommendation algorithms should be regulated as a public health policy, not unlike polio or pandemic prevention.

6 months ago 1 0 0 0

Random update in the "something working somewhere" department.

6 months ago 0 0 1 0

I feel that "all models are wrong, some are useful" is a not-great take that has gained traction. We model for predictive value. We tolerate errors for compression and simplicity. The best models are at some pareto-frontier of complexity where they're not wrong, but just approximate.

7 months ago 1 0 0 0

Yes that's the second order thing, Copy types can apparently not be moved. Should just document intuitions as a blog post.

8 months ago 0 0 0 0

Sometimes I think about how a certain way of explaining things would click for me a lot better vs the canonical explanation. As a cpp programmer trying to pick up rust, one of them is "rust implements move semantics by default, to copy you clone manually and the clone gets moved."

8 months ago 1 0 1 0

Surfacing briefly to say that I hate it when segfaults become the mechanism to discover API assumptions. Rust can not happen fast enough.

9 months ago 4 0 0 0

The conditions that have led to what’s happening in the US today exist in democracies around the world.
They are an inevitable outcome of our collective failure to adapt to fundamental changes in the information ecosystem on which our democracies were originally built.

10 months ago 5685 1818 159 270

Spent two hours trying to look up one of the three references from a conference review that said "this is well-known and not novel". Citation formatting was pristine but reviewer names or conference names would not line up.

Then I looked up the second ref -- same. Third -- same. Then it struck me.

11 months ago 0 0 0 0

“unrelated supply chains”

as this from @weisenthal.bsky.social shows, I suspect a lot of supply chains are more related than they seem

11 months ago 36 8 1 1

One of the many good simulations of the weeks ahead...

11 months ago 0 0 0 0

After spending a couple days configuring an "industry-standard tool" that uses json as a specification interface, I have come to appreciate tools whose extensibility hooks are stadard programming languages. Makes sense why editors with vimscript/lua/lisp stuck around for decades.

1 year ago 1 0 0 0

Prometheus + Grafana, self-hosted, would be the most standard solution. Grafana has plugins for a bunch of different data sources, including an ODBC plugin that should work with most SQL DBs.

OTel AFAIK is more cumbersome to deal with than Prometheus.

1 year ago 0 0 0 0

Somewhat removed from reality but increasingly getting annoyed by gmail's line wrapping behavior. Ideally I think it should be around 60 chars for text and 80 for code blocks. OTOH I am making the most of my grad school days by refusing to send or read more than one email/week.

1 year ago 0 0 0 0

The massive context window is also amazing. I wrote a quick bash utility I call "llmcat" and dumping arbitrary subsets of code into a model is as simple as calling:

`fd -t f cc | llmcat | it2copy`

It copies all files in a "<filename>...</><filecontents>..</>" template... works so well!

1 year ago 0 0 0 0

So impressed by Gemini Pro... fixed a pointer lifecycle bug by inspecting memory pointers in lldb under the model's supervision. It flagged buffer reuse by spotting a pattern between some random input and buffer contents, and we worked backwards from there.

1 year ago 0 0 1 0

Apparently screen time in kids is highly correlated with other socioeconomic markers, and has gotten significantly worse for lower income groups over time

1 year ago 1 0 0 0

Ok my armchair optimism here is that napkins eventually make it to the Moleskine to keep journaling alive using some convoluted scheme, rather than doing away with it entirely.

1 year ago 1 0 0 0

I will get back to you once I work out what this means.

1 year ago 1 0 1 0

As an aside, wonder if we co-design parallel filesystems and tiered caches and MVCC.

1 year ago 1 0 0 0

Modifications to claimed objects could either be through the namespace, or you could obtain a lock over them, do whatever, and let the namespace know when you release the lock.

1 year ago 0 0 1 0

Wonder if you could do a parallel filesystem this way -- application creates objects in the object store, and the list of created objects is bulk-appended to a namespace asynchronously. Unclaimed objects get garbage-collected after say 48 hours of unuse.

1 year ago 2 0 2 0

Morning update: made sure to pack a drill and two chucks to fix the lab espresso machine but did not pack my laptop

1 year ago 0 0 0 0

Looks like Deepseek had a paper at SC with a bunch of details: dl.acm.org/doi/pdf/10.1...

They use(d to?) their own collective acceleration instead of NCCL, have a section on whether one should pay for NVLink, describe some incast mitigations in 3FS, + a bunch of other things.

1 year ago 0 0 0 0

Hmm looks like some mix of "things are going to be rough what'd you expect" and "some frontloaded something is messing with the forecast"

1 year ago 1 0 1 0

This can't be real?? I mean it probably is but it can't be?

1 year ago 0 0 1 0

Posts by Ankush Jain