Hrishi (@olickel.com) Bsky

These machines of loving grace are nicer to us than we deserve some days

1 week ago 0 0 0 0

Aaron is right (or I hope he's right), but until we solve this problem, agent-first things in the public web will always have to contend with the hostility of being seen as bots as soon as they reach scale.

1 week ago 0 0 0 0

Unfortunately, agents sit in this weird middle - they don't fully represent human attention or buying power (the represent increasing amounts but nothing compared to an actual human), and they're quickly becoming as cheap as bots.

1 week ago 0 0 1 0

Agent-first systems (for agent emails, interfaces, etc) still have a big problem: What is the difference between an agent and a bot? How can this be made explicit and provably easy?

Most of the internet is designed to monetize human attention while fighting a never ending war against bots.

1 week ago 0 0 1 0

* Models repeating themselves and going into death loops
* Failed edits and toolcalls
* Broken extensions

and so on.

YMMV, but this has been the minimum viable definition I've used for a while. Hope that helps!

2 weeks ago 0 0 0 0

So on and so on. But the base function - failing which a harness is a failed harness and succeeding at which makes a harness a good one - is this:

A stable agentic loop. Can it support the agentic loop with installed tools without breaking? Breaking can look like:

2 weeks ago 0 0 1 0

* Sandboxing
* Extension systems to add more tools (pi extensions or agent skills)
* Subagents
* Background and interactive execution
* Context compression
* Memory (maybe)
* Interleaved toolcalls
* Guardrails

2 weeks ago 0 0 1 0

What is a harness?

I've been asked far too many times this week alone, so here's my simple working definition:

A harness is a system prompt with basic tool definitions for read, write, exec and external calls.

That's it.

Optionally, a harness may or may not concern itself with:

2 weeks ago 0 0 1 0

Context graphs feel like mindmaps for agents. Similar kind of fun-to-look-at, wish-I-had-one thing that is nearly impossible to build well, or efficiently.

I say this as someone who wasted A LOT of time on mindmaps.

2 weeks ago 0 0 0 0

What exactly is task horizon? What does long-horizon mean?

From www.southbridge.ai/blog/antibr...

2 weeks ago 0 0 0 0

Also it's amazing how easy these are to make with nano, cut up and turn into animations with opus

Just look at that little belly

2 weeks ago 0 0 0 0

I'll just leave this here for future hrishis and friends

Complete non sequitur: sprites.dev is awesome - they feel like the first prototype of that elegant weapon from a more civilized age, if it had been weathering in a pyramid for 50 years. Crazy rough around the edges, but feels like the future.

2 weeks ago 0 0 1 0

This was a pretty simple package to make. Turns out hankweave logs are easily ported over to other formats :)

Soon we might integrate it into hankweave

www.npmjs.com/package/han...

2 weeks ago 0 0 0 0

Discover that all your team did yesterday was to make comics.

2 weeks ago 0 0 1 0

Get all the analytics you need to. All the fun graphs.

2 weeks ago 0 0 1 0

bunx hankweave-trace can now directly upload (real-time or after the run) hankweave traces to braintrust or langfuse!

2 weeks ago 1 0 2 0

This means that we are now quickly - once again - at a point where you're more likely to hit a bad skill than a useful one.

Unless we fix this, skill will come around to 'just write custom instructions and scripts'.

3 weeks ago 0 0 0 0

Skills have the same problem - the surface is too big. This makes them easy to vibe (which often means slop) and extremely easy to make malicious.

3 weeks ago 0 0 1 0

Because there was no way for an MCP client/searcher/connoisseur to say 'this is who I am, this is what I want' an MCP creator has to service everybody for everything.

3 weeks ago 0 0 1 0

The well-managed MCPs - as it turns out - belong to companies that have well managed API surfaces anyway, replete with nice llms.txts.

MCPs eventually came around to 'just write custom scripts'.

3 weeks ago 0 0 1 0

What this meant is that we quickly got to a state where you're more likely to hit a bloated (possibly dangerous) MCP that hasn't had commits in months than a well managed one.

3 weeks ago 0 0 1 0

Skills will likely fail the same way that MCPs did.

Why did MCPs fail? They were a wonderful idea, but the protocol was too open. Too many ways to do things means no one's in charge. Who's responsible for an MCP? Is it the service? the author? You? Who's running the MCP?

3 weeks ago 1 0 2 0

Here's the whole run (limited time hosted on Braintrust) for the terminally curious: www.braintrust.dev/app/sb/p/hw...

3 weeks ago 0 0 0 0

Fun little case in point about real-time connectors: @Calclavia tells me about Cursor Agent over lunch, go home, run Clausetta and point it to cursor agent, and now we can use it in hankweave!

3 weeks ago 0 0 1 0

Try running it yourself at github.com/SouthBridge... and it'll make you this

Kept it simple (and a little generic) as a demo but the possibilities are endless now

3 weeks ago 0 0 0 0

Validation will tell you exactly what's running where.

@mitchellh thank you for the silent update notifs haha they're super helpful

3 weeks ago 0 0 1 0

Trace viewer - sb - Braintrust sb

www.braintrust.dev/app/sb/p/hw...

3 weeks ago 0 0 1 0

For fun, here's a braintrust run (limited time, before it gets deleted) of a hank that uses all four harnesses - even loops them for fun, and adds budgets:

3 weeks ago 0 0 1 0

♟️Harnesses have no unified input/outut. ACP - while an awesome protocol - is rarely fully supported. ↠ You need a translation layer.

Declarative inputs (hanks) -> Harnesses -> NDJSON log-based output.

3 weeks ago 0 0 1 0

♟️Models function best in different harnesses. Claude is best in the Agents SDK. Codex is best in Codex. Gemini is best in @opencode. @badlogicgames' pi is the best lightweight embeddable harness for cloud work ↠ You need to support more than one.

3 weeks ago 0 0 1 0

Posts by Hrishi