Very promising early results from memory benchmark on my system. Out of the box running against a published benchmark, zero tuning, and already 15-20 points above the embedding baseline. It'll be a few more hours for the full test to complete but this was the "hard" section.
Posts by Chrispian
Setup a test harness so I can test AI memory apps (LongMemEval). Running it on my own memory app and it's 5/5 so far on the golden examples. Not a large sample but happy it's off to a good start. It'll take a few hours to fully run (first run embeddings). I predict a high score.
Almost done with a major foundational update to my ai chat app. This closes the remaining plumbing gaps and everything from here is mostly behavior/business logic using this plumbing. So excited to finally have all this part done. Really proud of how this is coming together.
Just had the best validation of my work probably in my entire career. And I can't tell you about it. UGH! I'll be able to share in a month or two. So excited, I can't wait!
Does your agent/harness have a dedicated tool drawer? Keeps the chat clean but you don't lose tool call details if you need them.
Major progress on Nanite the last few weeks. This quick demo shows how agents & human can collaborate on session todos and plans. The user can mark items done, undone and send it back, drag/drop to sort, etc. This is just a sneak peak, you should see the entire workflow!
Finally down to just testing/docs for Nanite. Should not be much longer. Also secured domains for the next 2 apps I plan on releasing after this one.
If you play videos at 2x or more, it's much easier to spot AI produced videos. Just throwing that out there.
Anthropic using regex instead of their actual *language model* to detect negative sentiment in prompts.
Plus I have tons of ideas on how to make my app work better with Claude and ideas to improve some areas of my app.
Did some spelunking through the claude source and it's good to see some ideas I'm working towards are right in line with what they are doing. A lot of it is obvious when you work on these types of apps but it's still killer validation that I rarely get in my work.
I'll check it out. Love seeing how other people are handling real orchestration. I've been dogfooding the system I've built over the last year and plan on open sourcing it as soon as it's cleaned up and ready.
People say agent orchestration when they mean "supports multiple providers/models". These two things aren't even similar. Really chaps my harness.
I'm a copyleft / anti-copyright type person. But you can't really be copyleft and also be mad when people take something open and make a closed version or don't give back (eg: WP vs WP Engine). Free means free or it means nothing. writings.hongminhee.org/2026/03/lega...
I love .io. It's one of my favorites too. I'm fine with other extensions a lot of those are squatted already too! Damn developers lol.
I really miss the days when you could buy a good domain for a project for face value. I worked with domain squatters in the dot com era and they ruined it for everyone then.
Today it's developers hoarding domains for projects they'll never get to (myself included).
Last year was about 80% AI, so it's not just that.
I've done DOUBLE the amount of contributions this year than I did all of last year in just 3 months. You can argue about the quality all you want but that would be true either way.
Got the plugin system for my AI chat app done. I'm really excited about this part. I've spent a lot of time making sure plugins are first class. All non-core features are plugin based and users can create their own plugins. Basic marketplace is also done!
Probably have about a week's worth of work left on my ai chat app. I know everyone's making these, but this one works exactly the way I want and I suspect there will be people who like the choices I made. I have to rename it because too many apps have similar names. Doh!
It's crazy how often I ask agents to use shadcn components and then they just... don't. They roll their own. It's much better when I have vercel and shadcn skills/tools/mcp installed plus stronger wording against rolling their own without stopping to confirm.
Working on finishing up my AI chat app. Going to have to change names, too many that are close to it. Working on final features for beta, fixing bugs and general testing, clean up, docs, etc. I should have a pretty good idea of release date after this weekend.
I know there's an AI bubble but it makes NO DIFFERENCE in what's going to happen. None. Remember the dot com bubble? Do you still use websites and apps? The bubble is purely financial. AI isn't going away or slowing down anytime soon.
I didn't use a specific structure, I misunderstood the level of the conversation and was focused too deeply and on low level details. I'm having to learn that at this level that the info they need is different. I'm used to communicating to people implementing, not deciding. That's the main shift.
For most of my career I haven't had a lot of direct communication with business/product level and I'm having to learn to communicate my ideas for that audience. I tend to over focus on technical details because I'm a tech nerd at heart. Only way to get better is practice tho.
With everything going on lately I really haven't had a chance to just sit. I'm feeling a bit overwhelmed now that everything is hitting me all at once. Trying to just relax and breath tonight.
I've never pitched an idea like this and certainly not at this scale. People were engaged and that's all you can really ask for. I still need to get better at pacing and balancing technical vs concrete use cases/benefits. The deep tech stuff is a lot all at once. Noted.
The meeting showing my tools and ideas went really well. I had a bit of nervous/excited energy but I think that only helped. I'm passionate about this and I don't like masking it. I think what I showed hit hard tho.
I have plugins and widgets working in my chat app already. That's how I added giphy and other apis and connectors as native skills that agents can use with the custom ad-hoc ui components. Widgets let users create their own chat experience better. The bookmarks widget is dope!
I created a component system my agent can use to create FORMS and other ui for us to work together. This updates the tasks in our task manager deterministically. On the fly UI.
And check the end, the agent knows how to use giphy inline.