Did I show you github.com/ayourtch-llm...? was a fun experiment ! I need to rerun it in a less ad-hoc fashion someday…
Posts by Andrew Yourtchenko
Oh, I never used Haiku. UD-Q4_K_XL seemed to trip up way too much, but maybe I need to run a bit more disciplined comparison - considering that even 8bit quant was tripping up, maybe it’s inherent, rather than because of quantization... I did stretch sessions to 2+ compactions of 256k context..
I am putting smilies as well. Saying please is good. Makes the matmuls happier. ( Or biases the output towards the area of latent space with happy texts - Depending on your philosophy :-) )
the thing we've all been talking about since the day google reader sunset
Oh, thanks a lot, that’s very useful ! The odd bit with the tools call was that it was one specific tool and one specific parameter of it, and problem disappear on a bigger quant, and happened with multiple harnesses + llama.cpp - but this gives a good direction to dig a bit further in, thanks!
An image of opencode using mcp-random to generate coin tosses.
”Oh, akshually, why not have an entirely useless MCP to do random things“.
github.com/ayourtch-llm...
Very cool result. Considering that LLMs apprear to amplify the human biases (if I grok well arxiv.org/pdf/2406.00092) - wonder if there exists a magic prompt to humans that could fix their biases…
Getting LLMs to simulate “true” randomness or generate diverse outputs is surprisingly difficult. We found a simple prompting trick that solves this by having the model generate and manipulate a random string. To be presented at #ICLR2026 this week!
Blog: pub.sakana.ai/ssot
Improvements in git feel like updates to chess lol
Interesting, I had it happen in both short and long (>180K) contexts, but appears interrupting it and sending “?” was enough to get it back on track, so that will be my automated remedy when I get to that. I didn’t try smaller models under assumption that they won’t be great with Rust…
i have had spotted two failure modes: infinite circular thinking and tool-calling without a parameter when it is absolutely assured that it has one, got it pretty confused. got it even on an 8bit unsloth quant - did you see anything similar ? had similar glitches with lower bit MoE Qwen 3.5…
(a possibly terrible but hopefully correct translation made with the help of AI, for a better aesthetic effect).
我再補充一個預測:那篇反思不會用英文寫成。
Here's my incredibly unpopular opinion/prediction:
Future generations will write an arc of history where they point out that this is the era when we realized how bad the 1900s conceptions of intellectual property were.
what’s wrong with mute/block ?
Is there a specific name for what’s in them ?
Not everyone using an airplane to simply get from A to B has to be a pilot. Not everyone taking a taxi from A to B has to know how to drive or to own a car. Not everyone having a bicycle is mandated to use it for all the transportation needs. Also, walking is known to have many benefits.
Why doesn't Dario just talk to his kid?
Curious if it were the same if you swap it with Qwen3.6. In my completely unscientific tests, Gemma4 seemed reasonably smart but extremely jumpy. I would happily put it on one-shot adversarial code review but I would not be too comfortable having it run 24x7. But that’s me…
That’s a good sign, you are on the right side of the Jagged Border of Capabilities! looking forward to give it a spin, feel free to @ me when it’s good to go / ready for testing ! :-)
especially when you can use claude design! :D
I did not quite repost yours yet as I saw your later message about needing to do some enhancements, but heck, a second life remake, rebuilt with the modern tech and using atproto as a fabric has quite some potential!
A viral GitHub project that claims to clone coworkers into a reusable AI skill is forcing Chinese tech workers to confront deeper fears.
Weird! While I definitely had quite a few long evenings with pizza, I never thought of pizza as an incentive, it was more “it’s too late to go a bar/resto, we might as well get some pizza delivered“ :-)
Who loses the bet pays the pizza ? :-)
as if I had any doubt. Qwen3.5-27B *loves* shortcuts, I would have been surprised if result was any different.
Negotiating with Open-Strix powered by Qwen-3.5-27B about whether the review is necessary or not - suggested a bet to give it a virtual cookie if it isn’t.
Gotta do what you gotta do to keep the troops morale up!
A screenshot of Claude using an MCP tool to securely access the infrastructure
9am yesterday: remember I wanted to make am MCP tool that would give one very fancy access to network administration…
5pm yesterday. A framework for accessing all kinds of devices, with credentials out of reach of agents, and a simple but flexible access control management for allowed/denied CLIs.
Was pleasantly surprised to learn I discovered a few tricks from fist principles, but still, a very worthy read! www.anthropic.com/engineering/...
Compared to 3.5-27b, 3.6 feels a bit more delicate, but with ~100tok/sec on my rig, I‘ll take it... i still had to explain to it what to do, and it was more vibe than specs, but with TDD where I didn’t forget about it…