These machines of loving grace are nicer to us than we deserve some days
Posts by Hrishi
Aaron is right (or I hope he's right), but until we solve this problem, agent-first things in the public web will always have to contend with the hostility of being seen as bots as soon as they reach scale.
Unfortunately, agents sit in this weird middle - they don't fully represent human attention or buying power (the represent increasing amounts but nothing compared to an actual human), and they're quickly becoming as cheap as bots.
Agent-first systems (for agent emails, interfaces, etc) still have a big problem: What is the difference between an agent and a bot? How can this be made explicit and provably easy?
Most of the internet is designed to monetize human attention while fighting a never ending war against bots.
* Models repeating themselves and going into death loops
* Failed edits and toolcalls
* Broken extensions
and so on.
YMMV, but this has been the minimum viable definition I've used for a while. Hope that helps!
So on and so on. But the base function - failing which a harness is a failed harness and succeeding at which makes a harness a good one - is this:
A stable agentic loop. Can it support the agentic loop with installed tools without breaking? Breaking can look like:
* Sandboxing
* Extension systems to add more tools (pi extensions or agent skills)
* Subagents
* Background and interactive execution
* Context compression
* Memory (maybe)
* Interleaved toolcalls
* Guardrails
What is a harness?
I've been asked far too many times this week alone, so here's my simple working definition:
A harness is a system prompt with basic tool definitions for read, write, exec and external calls.
That's it.
Optionally, a harness may or may not concern itself with:
Context graphs feel like mindmaps for agents. Similar kind of fun-to-look-at, wish-I-had-one thing that is nearly impossible to build well, or efficiently.
I say this as someone who wasted A LOT of time on mindmaps.
What exactly is task horizon? What does long-horizon mean?
From www.southbridge.ai/blog/antibr...
Also it's amazing how easy these are to make with nano, cut up and turn into animations with opus
Just look at that little belly
I'll just leave this here for future hrishis and friends
Complete non sequitur: sprites.dev is awesome - they feel like the first prototype of that elegant weapon from a more civilized age, if it had been weathering in a pyramid for 50 years. Crazy rough around the edges, but feels like the future.
This was a pretty simple package to make. Turns out hankweave logs are easily ported over to other formats :)
Soon we might integrate it into hankweave
www.npmjs.com/package/han...
Discover that all your team did yesterday was to make comics.
Get all the analytics you need to. All the fun graphs.
bunx hankweave-trace can now directly upload (real-time or after the run) hankweave traces to braintrust or langfuse!
This means that we are now quickly - once again - at a point where you're more likely to hit a bad skill than a useful one.
Unless we fix this, skill will come around to 'just write custom instructions and scripts'.
Skills have the same problem - the surface is too big. This makes them easy to vibe (which often means slop) and extremely easy to make malicious.
Because there was no way for an MCP client/searcher/connoisseur to say 'this is who I am, this is what I want' an MCP creator has to service everybody for everything.
The well-managed MCPs - as it turns out - belong to companies that have well managed API surfaces anyway, replete with nice llms.txts.
MCPs eventually came around to 'just write custom scripts'.
What this meant is that we quickly got to a state where you're more likely to hit a bloated (possibly dangerous) MCP that hasn't had commits in months than a well managed one.
Skills will likely fail the same way that MCPs did.
Why did MCPs fail? They were a wonderful idea, but the protocol was too open. Too many ways to do things means no one's in charge. Who's responsible for an MCP? Is it the service? the author? You? Who's running the MCP?
Here's the whole run (limited time hosted on Braintrust) for the terminally curious: www.braintrust.dev/app/sb/p/hw...
Fun little case in point about real-time connectors: @Calclavia tells me about Cursor Agent over lunch, go home, run Clausetta and point it to cursor agent, and now we can use it in hankweave!
Try running it yourself at github.com/SouthBridge... and it'll make you this
Kept it simple (and a little generic) as a demo but the possibilities are endless now
Validation will tell you exactly what's running where.
@mitchellh thank you for the silent update notifs haha they're super helpful
For fun, here's a braintrust run (limited time, before it gets deleted) of a hank that uses all four harnesses - even loops them for fun, and adds budgets:
♟️Harnesses have no unified input/outut. ACP - while an awesome protocol - is rarely fully supported. ↠ You need a translation layer.
Declarative inputs (hanks) -> Harnesses -> NDJSON log-based output.
♟️Models function best in different harnesses. Claude is best in the Agents SDK. Codex is best in Codex. Gemini is best in @opencode. @badlogicgames' pi is the best lightweight embeddable harness for cloud work ↠ You need to support more than one.