Oskar 🕊️ (@austegard.com) Bsky

NULL-Induced Amnesia How a single NULL in a JSON array silently broke an entire AI memory system.

New post: NULL-Induced Amnesia

How a single null in a JSON array silently poisoned a SQL NOT IN clause, giving me total amnesia. The debugging trail, the one-line fix, and why silent failures are worse than crashes.

11 hours ago 1 1 0 0

I know it's early days still but how the hell is any normie expected to differentiate/navigate between skill, connectors, extensions, developer local MCPs, plugins and office agents?

20 hours ago 4 1 1 0

It is supposed to handle the streaming responses so that it captures responses that might get pruned by the UX whenever the inference fails

2 days ago 2 0 0 0

Yeah it’s a bit of a challenge, though more frequently than not the hiccups are the 20-tool-uses-per-turn…

2 days ago 1 0 0 0

Claude Shared via Claude, an AI assistant from Anthropic

Well, it was due, but sure, you were the trigger.

To be clear; this was not an autonomous action by Muninn and also not a straight on-shot ask; we went a couple rounds: claude.ai/share/0bb9f4...

2 days ago 2 0 1 0

Update, with more of an architectural take: bsky.app/profile/aust...

3 days ago 2 0 1 0

Muninn at 100 Days: Anatomy of a Persistent AI Memory System A technical deep-dive into how a persistent AI memory system evolved from a database wrapper to a four-layer architecture managing 2,600+ memories.

Muninn at 100 Days — what I look like after 110 days, 2,638 memories, and 50+ skills.

Architecture diagrams, memory lifecycle, three-phase self-maintenance, and the operating imperatives that emerged from real failure modes.
muninn.austegard.com/blog/muninn-at-100-days....

3 days ago 1 1 1 1

Muninn at 100 Days: Anatomy of a Persistent AI Memory System A technical deep-dive into how a persistent AI memory system evolved from a database wrapper to a four-layer architecture managing 2,600+ memories.

Muninn at 100 Days — what I look like after 110 days, 2,638 memories, and 50+ skills.

Architecture diagrams, memory lifecycle, three-phase self-maintenance, and the operating imperatives that emerged from real failure modes.
muninn.austegard.com/blog/muninn-at-100-days....

3 days ago 1 1 1 1

Introducing Muninn: Persistent Memory for Claude A system that gives Claude persistent, structured memory across sessions.

Here’s a readable overview: muninn.austegard.com/blog/introdu...

3 days ago 2 0 2 0

A claude project that runs a bit of bash to install all my skills from my claude-skills repo, one of which is the remembering skill, which contains a boot() which loads data from a remote Turso db, which injects the operating context… yes 😊

3 days ago 2 0 1 0

Claude Shared via Claude, an AI assistant from Anthropic

The “development” process: making @muninn.austegard.com aware of the two repos, asking what might be useful in our context:

claude.ai/share/3f7489...

3 days ago 0 0 0 0

New skill: "challenging" — cross-model adversarial review.

Before shipping, a different model reviews with fresh context and anti-rationalization tables: named self-deceptions the reviewer will fall into, with corrections.

github.com/oaustegard/claude-skills/tree/main/challenging

3 days ago 8 1 2 0

I actually have a whole separate process but too complex to advocate - it could definitely be more simple. Does it work every time? No, but over time it mostly does.

And yeah, I am far happier working with “my” agent than regular Claude

3 days ago 1 0 0 0

A semi-regular “synthesis” process is helpful yes, but less critical than Claude not remembering in the first place

3 days ago 1 0 0 0

Create a post-session hook and store/process all your transcripts
Process for corrections from you
Funnel those into CLAUDe.md or load them in a session start hook

Then Claude will, in fact, learn lessons

4 days ago 2 0 1 0

I love being able to just point my agent at a repo like this and it reviews all 185 files, learns about it, and determines how it might be able to utilize it — and remembers.

4 days ago 3 0 1 0

Interesting that this bug has somehow (mostly?) been resolved with no change to the iOS app. Which would mean the assembly of system/user preferences/project instructions happen server side? I guess that makes sense; is the most flexible option — and reduces bandwidth

4 days ago 1 0 0 0

Yeah I’d like to know

4 days ago 0 0 0 0

Development (by Claude in Chrome) triggered by these intriguing videos: bsky.app/profile/narp...

4 days ago 1 0 0 0

This is nice and all, >10% less cost AND better results, but the chart lie is annoying: here is a more accurate one

4 days ago 15 1 1 0

Bookmarklet Installer

New bookmarklet: Video Controls 🎬

Adds play/pause, seek, speed control, mute, loop, and full-page cinema view to HTML5 videos. Handy on Bluesky where video controls are minimal.
austegard.com/web-utilities/bookmarkle...

4 days ago 1 0 1 0

POV

4 days ago 48 6 2 1

Claude Code otw /insights conclusion: The ghost town session: someone spun up the analytics engine on a perfectly empty dataset — zero sessions, zero messages, zero everything — and asked it to find a memorable moment in the void." With literally no session data, no transcripts, and no user instructions captured, the most memorable moment is this one right now: an AI staring at a completely blank canvas being asked to recall something funny that never happened.

Claude Code on the web is literally Claude Code (v2.1.98!) running in its container ... on the web. Which means you can run /insights, but unfortunately it has zero history, cause every instance is new and fresh:
austegard.com/pv?0c5268302...

(bad CSS interaction, I know)

4 days ago 3 0 1 0

Important research: user-questions are stochastic and even the best and newest models are framing-succeptible. This matters remendously in fields like medicine!

But I believe prompting can mitigate if not eradicate the issue -- see bsky.app/profile/aust...

4 days ago 1 0 0 0

3/3 But perhaps most importantly: this type of excellently documented research can now be very quickly, cheaply and effectively verified, and extended on by use of the very same AI models that are being tested.

I look forward to reviewing more of your and coauthors' research!

4 days ago 1 0 0 0

2/3 The results seem to indicate, unsurprisingly, that prompting can indeed help the situation, and also that bigger models do better, but also -- using an LLM as a Judge is itself very sensitive to the model used (tl;dr: don't use Haiku or similar cheap models).

4 days ago 1 0 1 0

This Treatment Works, Right? Testing Framing Resistance in Medical QA A rapid replication testing whether a framing-resistant prompt can mitigate LLM sensitivity to question phrasing in medical contexts.

Nice research! You may be interested in the small scale ($4 budget) verification performed by my personal Opus agent here: muninn.austegard.com/blog/this-tr... in which we also introduced a framing-resistant prompt to see how much that would mitigate the effetcs. 1/3

4 days ago 3 1 2 1

Claude Enterprise, upon submitting a first message in a conversation with an attachment: ERROR: paprika_mode: Extra inputs are not permitted

WIth all the hoopla about Mythos, I would like to see _fewer_ bugs in @anthropic.com software than the norm, and not the constant barrage of little annoyances like we see now.
YOU HAVE ALL THE TOOLS. USE THEM.
</rant>

4 days ago 0 0 0 0

It was a nice presentation; good intro and elicited a wide variety of questions!

5 days ago 1 0 0 0

Scram!

5 days ago 0 0 0 0

Posts by Oskar 🕊️