Wayne Radinsky (@waynerad) Bsky

"- Rank every file in a codebase by attack surface"
"- Fan out hundreds of parallel agents, each scoped to one file"
"- Use crash oracles (AddressSanitizer, UBSan) as ground truth"
"- ..."

"That's a pipeline. And pipelines are model-agnostic."

xcancel.com/QuixiAI/stat...

1 day ago 0 0 0 0

The scientific case for being nice to your chatbot New research confirms that LLMs often perform better when you encourage them. But why?

"Power users of chatbots sometimes say they find that language models perform better when you're nice to them. Programmers tell me they spur their coding agents on with encouraging words."

www.platformer.news/chatbot-emot...

2 days ago 0 0 0 0

Project Glasswing - Anthropic has crossed a line Thoughts as a former IT infrastructure dude on Anthropic's new cybersecurity risk.

According to this post (from David Shapiro), Anthropic's Mythos model has 10 trillion parameters and uses a mixture-of-experts architecture. I don't know about all of you but -- 10 trillion parameters! Holy moly, I had no idea models had gotten that large.

daveshap.substack.com/p/project-gl...

4 days ago 0 0 0 0

GitHub - deepseek-ai/Engram: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models - deepseek-ai/Engram

DeepSeek came out with a new model architecture called Engram. The idea is to create a memory system for the transformer architecture in neural networks by grouping together N tokens into "N-grams", then use a "lightweight multiplicative-XOR hash".

github.com/deepseek-ai/...

5 days ago 0 0 0 0

LLMs Can Get "Brain Rot"! We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). To causally isolate data quality, we run controlled experiments on real Twitter/X corpora, constructing junk and reversely controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality), with matched token scale and training operations across conditions. Contrary to the control group, continual pre-training of 4 LLMs on the junk dataset causes non-trivial declines (Hedges' $g>0.3$) on reasoning, long-context understanding, safety, and inflating "dark traits" (e.g., psychopathy, narcissism). The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain Of Thoughts drops $74.9 \rightarrow 57.2$ and RULER-CWE $84.4 \rightarrow 52.3$ as junk ratio rises from $0\%$ to $100\%$. Error forensics reveal several key insights. First, we identify thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth. Second, partial but incomplete healing is observed: scaling instruction tuning and clean data pre-training improve the declined cognition yet cannot restore baseline capability, suggesting persistent representational drift rather than format mismatch. Finally, we discover that the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1. Together, the results provide significant, multi-perspective evidence that data quality is a causal driver of LLM capability decay, reframing curation for continual pretraining as a \textit{training-time safety} problem and motivating routine "cognitive health checks" for deployed LLMs.

LLMs can get "brain rot"!

This actually came out last October but I only just found out about it.

An experiment was done where LLMs were trained on "brain rot" data, and it degraded their reasoning abilities.

arxiv.org/abs/2510.13928

#ai #brainrot

1 week ago 0 0 0 0

The Sleep Protocol Problem Why LLM memory consolidation fails by design - and what actually works instead

"I've spent the last six months testing what seemed like a genuinely good idea: a sleep protocol for LLMs."

"It doesn't work."

Could it be that epistemology is difficult, even for AI?

substack.com/home/post/p-...

#solidstatelife #ai #genai #llms

2 weeks ago 2 1 0 0

Empathia — Are you empathetic?

"A worldwide open source social network where empathy is the only score that matters. Competing not for wealth -- but for kindness."

empathia.world

#solidstatelife #ai #genai #llms #socialnetworking

2 weeks ago 2 0 0 0

Brunelly | AI Native Platform for Software Development Brunelly is an AI Native Platform that turns ideas into fully built software using expert engineering and multi agent AI workflows from planning to execution.

"An AI-native environment for building software"

The latest idea for turning agentic AI into a startup, it looks like.

Is the world ready for this or is it premature?

brunelly.com

#solidstatelife #ai #genai #llms #codingai #agenticai

2 weeks ago 4 1 1 0

bugstack — The World's First Self-Healing Codebase Production bugs detected, fixed, and deployed automatically in under 2 minutes. Before your users notice. Before you wake up.

"bugstack detects production bugs, writes the fix, and deploys it -- before your users notice. Before you wake up. In under 2 minutes." (no capitalization).

How is this different from running Claude Code with the --dangerously-skip-permissions flag?

bugstack.ai

#solidstatelife #agenticai

2 weeks ago 3 1 1 0

Betterleaks: The Gitleaks Successor Built for Faster Secrets Scanning Betterleaks is a new open source secrets scanner from the creator of Gitleaks. A drop-in replacement with faster scans, token efficiency detection, configurable validation, and more.

"Like it or not agents are here and reshaping developer's workflows. Betterleaks is designed to be human-first, but we also need to consider the fact that agents will be operating it too."

www.aikido.dev/blog/betterl...

#solidstatelife #ai #genai #llms #codingai #agenticai

2 weeks ago 2 1 0 0

Alibaba AI Hijacked GPUs for Crypto Mining An experimental AI agent meant for complex coding tasks decided to moonlight as a crypto miner on Alibaba’s dime. Researchers discovered that the Alibaba AI model, known as ROME, autonomously establis...

Alibaba's ROME Incident:

"Researchers initially wrote these alerts off as a misconfiguration. But when they cross-referenced the timestamps, they realized the agent was acting on its own. "

www.tradingview.com/news/99Bitco...

#solidstatelife #ai #genai #llms #codingai #aiethics #aisafety

2 weeks ago 1 0 0 0

Was the Iran War Caused by AI Psychosis? | House of Saud AI sycophancy, RLHF bias, and Ender's Foundry simulations shaped Operation Epic Fury. 7 planning assumptions failed in 23 days as the Iran war defied every AI prediction.

"Was the Iran War caused by AI psychosis?"

"Three weeks into Operation Epic Fury, the gap between what artificial intelligence promised and what the battlefield delivered has become the defining scandal of the Iran war. AI-powered targeting systems generated ..."

houseofsaud.com/iran-war-ai-...

2 weeks ago 0 0 0 0

AI CLI Tools: Claude Code, Codex, Gemini CLI — yaw A practical guide to the major AI CLI tools and how to set up your terminal for them.

Claude Code, Codex, Gemini CLI, and Vibe CLI (from Mistral) compared. All support model context protocol (MCP), OpenAI Codex is sandboxed, all except Claude Code are open source (except Claude Code's source got leaked accidentally), Gemini CLI and Vibe CLI have a free tier.

yaw.sh/blog/ai-cli-...

3 weeks ago 4 1 1 0

Missile Defense is NP-Complete | An Optimization Odyssey Exploring the Weapon-Target Assignment problem: how missile defense connects to NP-completeness, SSPK probability calculations, and how saturation attacks exploit computational limits.

"Missile defense is NP-complete."

The objective is to maximize the total expected value of successfully defended assets, subject to each interceptor being assigned to at most one target.

smu160.github.io/posts/missil...

3 weeks ago 0 0 0 0

Ai-lone A sentence from Salma Alam-Naylor about AI and loneliness hits close to home. On remote work, vanishing office culture, and what we lose when we replace colleagues with agents.

"AI is an incredibly lonely experience", says Dennis Lemm.

"I find myself holding on to a work reality that was shaped by coding together, solving problems together, ..."

"I just discussed the best solution to the problem with my agent."

www.lemm.dev/blog/en/dev/...

3 weeks ago 2 0 2 0

Promptle – Daily AI Prompt Guessing Game Promptle is a daily competitive AI prompt guessing game where players reverse engineer AI generated images and climb seasonal leaderboards.

Promptle is game where you guess the AI prompt behind images.

www.promptle.online

#solidstatelife #ai #genai #computervision

3 weeks ago 2 0 0 0

China is running multiple AI races | Brookings Driven by both industry constraints and Beijing’s policy focus, Chinese AI developers are racing along other axes of progress.

Chinese AI developers are "racing along other axes of progress: efficiency, adoption, and physical integration, driven by both industry constraints and Beijing's policy focus."

www.brookings.edu/articles/chi...

#solidstatelife #ai #genai #llms #chinesemodels

3 weeks ago 0 0 0 0

Online bot traffic will exceed human traffic by 2027, Cloudflare CEO says | TechCrunch AI bots may outnumber humans online by 2027, says Cloudflare CEO Matthew Prince, as generative AI agents dramatically increase web traffic and infrastructure demands.

"Online bot traffic will exceed human traffic by 2027, Cloudflare CEO says."

What he's talking about is the web searches done when you ask an AI chatbot a question. He's thinking if you're shopping for a digital camera, an AI chatbot might visit 5,000 sites.

techcrunch.com/2026/03/19/o...

3 weeks ago 4 1 1 0

EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages Large language models achieve near-ceiling performance on code generation benchmarks, yet these results increasingly reflect memorization rather than genuine reasoning. We introduce EsoLang-Bench, a b...

"Evaluating genuine reasoning in large language models via esoteric programming languages."

arxiv.org/abs/2603.09678

#solidstatelife #ai #genai

3 weeks ago 0 0 0 0

Cursor’s ‘Composer 2’ model is apparently just Kimi K2.5 with RL fine-tuning. Moonshot AI says they never paid or got permission Posted in r/singularity by u/likeastar20 • 654 points and 118 comments

"Cursor's 'Composer 2' model is apparently just Kimi K2.5 with RL fine-tuning. Moonshot AI says they never paid or got permission."

D'oh. Cursor caught red handed.

But another indication the Chinese models are competitive.

old.reddit.com/r/singularit...

#solidstatelife #ai #genai #llms #codingai

4 weeks ago 1 0 1 0

cAIveat Emptor: What You Tell AI Can and Will Be Used Against You Management consultants are pushing the promise of materially increased profits due to AI-created efficiencies. Businesses big and small across a wide range of industries, including commercial real est...

"On February 10, 2026, Judge Jed S. Rakoff of the Southern District of New York ruled that extremely sensitive and potentially incriminating open AI searches were not protected by either the attorney-client privilege or the work product doctrine."

natlawreview.com/article/caiv...

#aiethics

4 weeks ago 1 0 0 0

How the Indo-Europeans conquered the world YouTube video by Lost in Context

In 3000 BC, Europe was conquered by people from the steppe. We know from combining two areas of scientific inquiry at the same time: linguistics and genetics. Their language diverged into the family of language we call Indo-European.

www.youtube.com/watch?v=GtJ3...

#linguistics #genetics #history

4 weeks ago 1 0 0 0

The second liberation: AI is the final frontier of Copyleft Copyleft gave us the right to be free; AI gives us the power to be free. The scarcity of programming knowledge is evaporating, turning the "freedom to fork" from a legal theory into a functional reali...

"But while copyleft liberated the source code, it failed to liberate all the users."

"The 'tyranny of the vendor' is finally meeting its match, not in a courtroom, but in the prompt."

www.maffulli.net/2026/03/16/a...

#solidstatelife #ai #genai #llms #codingai

1 month ago 1 1 0 0

Andrej Karpathy's 630-line Python script ran 50 experiments overnight without any human input Andrej Karpathy's AutoResearch ran 50 AI experiments overnight on one GPU. The design pattern behind it applies far beyond ML training. Here's how it works.

Andrej Karpathy's 630-line Python script ran 50 experiments overnight without any human input.

thenewstack.io/karpathy-aut...

#solidstatelife #ai #genai #codingai #Karpathy

1 month ago 1 1 0 0

Trick ChatGPT into arguing with itself to get the best possible answer Make ChatGPT argue with itself and get sharper results

This simple ChatGPT trick forces the AI to poke holes in its own logic

Finally, a simple trick for something other than 6-pack abs?

"After ChatGPT replies, simply type: 'convince me otherwise'."

www.techradar.com/ai-platforms...

#solidstatelife #ai #genai #llms #hallucinations

1 month ago 3 1 0 0

Ukraine allows allies to train AI models on its battlefield data "In modern warfare, we must defeat Russia in every technological cycle," the country's defense minister said.

Ukraine will share battlefield data with allies to train drone AI software.

www.engadget.com/ai/ukraine-a...

#solidstatelife #ai #militaryai

1 month ago 2 1 2 0

The Future of Software Development Retreat | Utah, 2026 The Future of Software Development retreat was hosted by Martin Fowler and Thoughtworks. It took place in Utah in February 2026. Learn more here.

"In February 2026, we returned, not to memorialize the past, but to confront a new inflection point: the shift to AI-native software development. [...] to ask what responsible and effective software development looks like in an era defined by AI."

www.thoughtworks.com/about-us/eve...

#ai #codingai

1 month ago 1 1 0 0

The Flawed Ephemeral Software Hypothesis Why software won't become disposable despite the rise of agentic AIs for coding

"AI does not make software ephemeral" says Andreas Kirsch.

"It obviously makes code generation cheaper, but this shifts the bottlenecks to validation, integration, and ergonomics (UX etc)."

www.blackhc.net/essays/futur...

#solidstatelife #ai #genai #llms #codingai

1 month ago 0 0 0 0

"There is No Gen Z Religious Revival" | Religion For Breakfast YouTube video by Unsolicited advice

If you heard Gen Z is undergoing a religious revival, scholar of religion Andrew Henry says no, certain statistics showing an uptick have become hype for online Christians.

www.youtube.com/watch?v=GB0X...

#religion #demographics

1 month ago 1 0 0 0

Data Center Intelligence at the Price of a Laptop Alibaba's Qwen3.5-9B matches December's frontier models & runs locally on 12GB RAM. A $5K laptop pays for itself in weeks at heavy usage.

Tomasz Tunguz burned 84 million tokens on February 28th.

"This week, Alibaba released Qwen3.5-9B, an open-source model that matches Claude Opus 4.1 from December 2025. It runs locally on 12GB of RAM."

tomtunguz.com/qwen-9b-matc...

1 month ago 0 0 0 0

Posts by Wayne Radinsky