Ran Aroussi (@aroussi.com) Bsky

The meta-lesson is older than embeddings.

Whenever you see a dial with multiple settings, the interesting question is never "which setting is best."

It's "where does the marginal return collapse?"

That's where the right answer lives. Almost always.

Draw the curve.

3 hours ago 0 0 0 0

One more gift of Matryoshka: the decision is reversible downward, never upward.

768 → 512 → 384 is free at any time.

384 → 768 requires re-embedding everything.

So: start at the generous end. Truncate later when scale demands it.

3 hours ago 0 0 1 0

And it's not just disk.

HNSW index RAM scales with dimensionality. 25% smaller vectors = 25% less RAM -- often the difference between fitting on one box and needing the next tier up.

Query latency drops proportionally. Smaller dot products, faster SIMD.

3 hours ago 0 0 1 0

This is the lesson.

Looking at absolute numbers ("how small can I go?") makes 384 look smart.

Looking at *marginal* numbers ("what's each step costing me?") makes 768 or 512 look smart (768 effectively being "free").

The first framing is lazy. The second is engineering.

3 hours ago 0 0 1 0

Plot it and the curve tells the whole story.

Convex. Falls steeply at first — you get lots of storage for almost no quality cost — then flattens, where each additional byte costs more quality.

The inflection is right around 512.

3 hours ago 0 0 1 0

But quality drops non-linearly (retrieval nDCG, Jina v3 on MTEB).

The first truncation buys 8x more storage per quality point than the last.

3 hours ago 0 0 1 0

Start with storage. Vectors are float32 = 4 bytes per dimension.

Tempting to just grab 256 and move on, right?

3 hours ago 0 0 1 0

So the natural instinct is: find the sweet spot.

Then I did the math. The answer surprised me.

3 hours ago 0 0 1 0

Modern models (Jina v3, OpenAI v3, Nomic) use something called Matryoshka Representation Learning.

Like the Russian dolls — the model is trained so you can chop off the tail of the vector and it still works.

1024 → 768 → 512 → 384 → 256. No re-embedding. Just slice. Feels like a free lunch.

3 hours ago 0 0 1 0

The list has a length. Usually 256, 384, 512, 768, 1024, ...

Longer list = more nuance captured = better retrieval.
But also: more disk, more RAM, slower queries.

So there's a dial. And the question is where to set it.

3 hours ago 0 0 1 0

First, what are embeddings?

An embedding is a list of numbers that represents the *meaning* of something - a sentence, an image, a product.

"Dog" and "puppy" get similar lists. "Dog" and "skyscraper" don't.

Every RAG system, semantic search, and agent memory runs on these.

3 hours ago 0 0 1 0

Been upgrading an embeddings pipeline this week, and I thought I'd share some insights that might help and/or educate people on embeddings.

Specifically: a lesson about optimization that surprised me.

Let's get into it. 👇

(warning: technical stuff ahead)

3 hours ago 0 0 1 0

x.com/aroussi/sta...

1 week ago 0 0 0 0

Part 3 of the Architect series is getting good feedback.

Part 1: AI writes the code.
Part 2: AI fixes the bugs.
Part 3: Who watches the server?

Nobody's talking about this.

It's the one that still wakes you up at 3am.

1 week ago 0 0 1 0

x.com/aroussi/sta...

1 week ago 0 0 0 0

I once spent an hour debugging a payment flow in production. Tracing logs, checking deploys, reading diffs.

The payment processor was returning a malformed JSON. Their status page said everything's fine.

Here's a possible solution.

1 week ago 0 0 1 0

x.com/aroussi/sta...

1 week ago 0 0 0 0

We use AI to write the code. AI to review the code. AI to fix the code. Then we deploy it onto infrastructure like absolute cavemen – SSH, hope, and prayer. It's 2026. This is embarrassing.

1 week ago 0 0 1 0

x.com/aroussi/sta...

1 week ago 0 0 0 0

– Monitoring tool: "Service is down."
– Me: "Cool. Why?"
– Monitoring tool: 🤷‍♂️

There's a better model. Wrote about it.

1 week ago 0 0 1 0

A week or so ago, I wrote about the law firm model for dev agencies — Architects running pods with revenue share.

The most common question: "How does one Architect deliver like a team of 5?"

This is the answer.

x.com/aroussi/sta...

3 weeks ago 0 0 0 0

x.com/aroussi/sta...

3 weeks ago 0 0 0 0

The hard problem in AI coding isn't code generation. Claude Code already solved that. It's decomposition. Planning. Verification. Simplification. PR management. Institutional memory.

That's engineering management, not a smarter agent.

3 weeks ago 3 0 4 0

x.com/aroussi/sta...

3 weeks ago 0 0 0 0

Three human checkpoints. Everything else autonomous.

- Approve the plan
- Review the PR
- Ship the release

That's how a good CTO works with a good team. It should be how you work with AI.

3 weeks ago 0 0 1 0

Coding agent: executes a task when you point it

Engineering daemon: runs your delivery pipeline while you lead

The gap between those two things is the gap between a freelancer and a team.

x.com/aroussi/sta...

3 weeks ago 0 1 0 0

You can be 10x more productive with AI and still starve if nobody knows you exist.

The solo dev trap is real.

The answer isn't "get better at marketing." It's structural.
x.com/aroussi/sta...

4 weeks ago 0 0 0 0

Every pod owner in this model gets what no traditional agency offers:

→ Revenue share on their projects
→ Shared profit pool across all pods
→ Firm handles sales, brand, infra, design, marketing
→ They just built

Old model. New tech. Big implications.
x.com/aroussi/sta...

1 month ago 1 0 0 0

Junior developers in 2026 shouldn't be hired for their coding ability.

They should be hired as the pipeline to becoming Architects — the hybrid of senior dev + product owner + team lead that runs an AI-augmented pod.

18-24 months from junior to running their own pod.
x.com/aroussi/sta...

1 month ago 0 0 0 0

"Go solo and earn a little more per hour but work twice as hard"
vs
"Stay at an agency and earn less while someone else captures your leverage"

There's a third option. Law firms figured it out 200 years ago.
x.com/aroussi/sta...

1 month ago 0 0 0 0

Posts by Ran Aroussi