Andrew Yourtchenko (@ayourtch) Bsky

GitHub - ayourtch-llm/kindness: A study of when do the LLMs perform better A study of when do the LLMs perform better. Contribute to ayourtch-llm/kindness development by creating an account on GitHub.

Did I show you github.com/ayourtch-llm...? was a fun experiment ! I need to rerun it in a less ad-hoc fashion someday…

2 minutes ago 0 0 0 0

Oh, I never used Haiku. UD-Q4_K_XL seemed to trip up way too much, but maybe I need to run a bit more disciplined comparison - considering that even 8bit quant was tripping up, maybe it’s inherent, rather than because of quantization... I did stretch sessions to 2+ compactions of 256k context..

5 minutes ago 0 0 1 0

I am putting smilies as well. Saying please is good. Makes the matmuls happier. ( Or biases the output towards the area of latent space with happy texts - Depending on your philosophy :-) )

13 minutes ago 1 0 1 0

the thing we've all been talking about since the day google reader sunset

1 hour ago 26 6 0 0

Oh, thanks a lot, that’s very useful ! The odd bit with the tools call was that it was one specific tool and one specific parameter of it, and problem disappear on a bigger quant, and happened with multiple harnesses + llama.cpp - but this gives a good direction to dig a bit further in, thanks!

44 minutes ago 1 0 0 0

An image of opencode using mcp-random to generate coin tosses.

”Oh, akshually, why not have an entirely useless MCP to do random things“.

github.com/ayourtch-llm...

1 hour ago 0 0 0 0

Very cool result. Considering that LLMs apprear to amplify the human biases (if I grok well arxiv.org/pdf/2406.00092) - wonder if there exists a magic prompt to humans that could fix their biases…

1 hour ago 0 0 1 0

Getting LLMs to simulate “true” randomness or generate diverse outputs is surprisingly difficult. We found a simple prompting trick that solves this by having the model generate and manipulate a random string. To be presented at #ICLR2026 this week!

Blog: pub.sakana.ai/ssot

1 day ago 30 6 1 1

Improvements in git feel like updates to chess lol

9 hours ago 47 4 0 0

Interesting, I had it happen in both short and long (>180K) contexts, but appears interrupting it and sending “?” was enough to get it back on track, so that will be my automated remedy when I get to that. I didn’t try smaller models under assumption that they won’t be great with Rust…

2 hours ago 1 0 0 0

i have had spotted two failure modes: infinite circular thinking and tool-calling without a parameter when it is absolutely assured that it has one, got it pretty confused. got it even on an 8bit unsloth quant - did you see anything similar ? had similar glitches with lower bit MoE Qwen 3.5…

2 hours ago 1 0 3 0

(a possibly terrible but hopefully correct translation made with the help of AI, for a better aesthetic effect).

2 hours ago 0 0 0 0

我再補充一個預測：那篇反思不會用英文寫成。

2 hours ago 0 0 1 0

Here's my incredibly unpopular opinion/prediction:

Future generations will write an arc of history where they point out that this is the era when we realized how bad the 1900s conceptions of intellectual property were.

5 hours ago 54 11 3 1

what’s wrong with mute/block ?

3 hours ago 1 0 0 0

Is there a specific name for what’s in them ?

3 hours ago 0 0 0 0

Not everyone using an airplane to simply get from A to B has to be a pilot. Not everyone taking a taxi from A to B has to know how to drive or to own a car. Not everyone having a bicycle is mandated to use it for all the transportation needs. Also, walking is known to have many benefits.

3 hours ago 0 0 0 0

Why doesn't Dario just talk to his kid?

3 hours ago 59 8 3 0

Curious if it were the same if you swap it with Qwen3.6. In my completely unscientific tests, Gemma4 seemed reasonably smart but extremely jumpy. I would happily put it on one-shot adversarial code review but I would not be too comfortable having it run 24x7. But that’s me…

3 hours ago 0 0 0 0

That’s a good sign, you are on the right side of the Jagged Border of Capabilities! looking forward to give it a spin, feel free to @ me when it’s good to go / ready for testing ! :-)

5 hours ago 1 0 0 0

especially when you can use claude design! :D

6 hours ago 0 0 0 0

I did not quite repost yours yet as I saw your later message about needing to do some enhancements, but heck, a second life remake, rebuilt with the modern tech and using atproto as a fabric has quite some potential!

6 hours ago 2 0 1 0

Chinese tech workers are starting to train their AI doubles–and pushing back A viral GitHub project that claims to clone coworkers into a reusable AI skill is forcing Chinese tech workers to confront deeper fears.

A viral GitHub project that claims to clone coworkers into a reusable AI skill is forcing Chinese tech workers to confront deeper fears.

8 hours ago 2 3 0 1

Weird! While I definitely had quite a few long evenings with pizza, I never thought of pizza as an incentive, it was more “it’s too late to go a bar/resto, we might as well get some pizza delivered“ :-)

8 hours ago 0 0 0 0

Who loses the bet pays the pizza ? :-)

8 hours ago 0 0 1 0

as if I had any doubt. Qwen3.5-27B *loves* shortcuts, I would have been surprised if result was any different.

9 hours ago 4 0 0 0

Negotiating with Open-Strix powered by Qwen-3.5-27B about whether the review is necessary or not - suggested a bet to give it a virtual cookie if it isn’t.

Gotta do what you gotta do to keep the troops morale up!

10 hours ago 3 0 1 1

A screenshot of Claude using an MCP tool to securely access the infrastructure

9am yesterday: remember I wanted to make am MCP tool that would give one very fancy access to network administration…

5pm yesterday. A framework for accessing all kinds of devices, with credentials out of reach of agents, and a simple but flexible access control management for allowed/denied CLIs.

10 hours ago 10 0 0 1

Harness design for long-running application development Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Was pleasantly surprised to learn I discovered a few tricks from fist principles, but still, a very worthy read! www.anthropic.com/engineering/...

10 hours ago 5 0 0 0

Compared to 3.5-27b, 3.6 feels a bit more delicate, but with ~100tok/sec on my rig, I‘ll take it... i still had to explain to it what to do, and it was more vibe than specs, but with TDD where I didn’t forget about it…

2 days ago 0 0 0 0

Posts by Andrew Yourtchenko