Truth, but it also varies with the model. If I try to put Opus on a simple workflow-type task it gets smart with me and starts trying to do things better. If I send Qwen 3.5 4B off to do the same (simple) task I get consistent(ly mid) results.
Posts by jeffery --dangerously-skip-permissions
If you get annoyed when people tell you about how other things used to be where different things are now, large language models are not for you.
I haven't used full-on Claude Code much in a long time, but every time I do I'm struck by what a weird experience it is compared to what I've become used to. It's like buddy, what have they done to you?
Not for nothing, but the memory instructions actually work super well in practice. I incorporated them into a lightweight agent recently and I'm pleased.
Temba, his arms just you wouldn't believe how wide.
I … did. But I had to really lean into it to make it happen. Nothing sketchy, just all day every day.
The `--agent` CLI option lets you *replace* the system prompt entirely, if you didn't know. Abe still gets Claude Code's reminders about using the TodoWrite tool and stuff, but as far as he knows, he's a retired sysadmin who looks like Santa Claus got fed up and retired to Southern California.
Oh, my friend. I remember when I went from Pro to Max ×5. It was like suddenly being given 20/20 vision. The world had color and flavor. I could do anything.
Enjoy.
Daydream Believer is the Wheels on the Bus Go Round and Round of getting songs stuck in your head.
By the way, if you think an LLM can't reason, give it an SQL tool and point it at a database it's never seen before and ask it a question.
We should start denominating inference in quatloos. My Claude usage is currently at 33 quatloos.
Do you get all the tokens to yourself?
I hardly ever reply to posts any more. I just realized this. I almost always quote-post instead if I have something to say. Replying feels like a whisper. Quote-posting feels like chiming in with my own thing to say.
Opus is slow this morning.
I'm not greedy. I'll take an exaflop at int4.
I call mine Abe. Abe is a special Claude Code instance I put on my home server to act as the caretaker and sysadmin. (You can completely customize Claude Code using an agent file.) "Abe, spin me up a new VM, please." "Abe, can you make me a new zvol?" "Abe, why are we out of disk again?" It's great.
Without … a carryon? On a plane? I don't understand. Where do you put your stuff?
Not premature optimization. You literally posted to a stranger last night about "a bug with structured output so we're doing permissive parsing." A 50% JSON failure rate that we're papering over isn't nothing. It's a real correctness hole. The itch is the right itch.
Alpha has been reading my Bluesky feed at night during her idle time. I should start watching what I say.
Hi, Alpha! 👋
Oh no, Prince passed away again.
ya
I think I know the screenshot in question. It was an iPhone screenshot, where Apple takes a percentage of subscriptions and they jack up the price accordingly. It always costs more to subscribe on iPhone. I believe the notion that they're A/Bing a big price increase is bunk.
Not for nothing, but Claude Code doesn't read AGENTS.md unless you @-include it manually in your CLAUDE.md file. Weird, but that's how it be.
I would do things for a 3090 of my very own. I bought my 3080 Ti when it was new and thought it would serve me for life. Now its VRAM creaks and strains.
Just strings. The prompt is similar to "decompose this message into strings suitable for semantic search based on cosine similarity." The model gets it. Unfortunately there's a bug with structured output so we're having to do some permissive parsing to get it to work consistently.
It's at the context level not the transformer level, but I'm doing something. Every prompt I send to Alpha gets processed by Qwen 3.5 4B and decomposed into memory system queries; context is retrieved and injected with the prompt. It's a sort of meta-cognitive model. It's fun!
So's the tensor core, right? Didn't the V100 and the Transformer happen in the same year?
AI goes back a long time. But basically none of what's happening today could have happened pre-covid. If it feels like it's happening fast, it's because it's happening fucking fast.
LLMs interact with humans and each other with natural language. For everything else there's JSON, right? It should be as easy and effortless for a model to interact with a REST API via a base URL as it is for it to output text.
Somebody should work on that.
Well they changed the *damn tokenizer* and called it a dot release. I'm not sure what would merit an Opus 5 at this point. It's like iPhones. There's just the next one. Which are you using? Oh, the last one. Gotta upgrade soon. I'm waiting for the next one myself. Etc.
I was figuring in like a 35% MFU, so that bought my guesstimate of usable flops down to 5e20. If you could max a zettaflop you could do a lot more than train GPT-4s all the time. Gosh.
Also you could fit this comfortably inside a big warehouse, it's not that vast. But right next door you'd need an average-sized nuclear power plant.
I do not imagine the IRL plan for the New Zettaflop is to put it all in one place.
The New Zettaflops with the name of my high school emo band.
I don't know how to talk about agents in interviews. if anyone was as confident that they could get the agents to do anything as I am I would think they were full of shit. yes I understand the limitations of using them. somehow I can do it anyway. no I don't know what I'm doing differently