The next level is forcing it into false positives on innocuous documents
Posts by Erica Windisch
Under what compliance framework is a financial institution allowed to store passwords in plaintext?
"Enter your online password using the digits on your keypad"
how do they even handle special characters?
everything is awful, isn't it?
I hope you don't use Android...
The latter is evergreen.
It's why CVE level bugs are often underreported and underrated.
What used to be a disagreement requiring researchers to concede an argument with maintainers, or commit to weeks of additional discovery, is now a prompt away.
Don't let them gate security research.
Open weight models and open source can and should be useful for software safety.
This isn't just about red team capabilities because blue teams depend on team red.
Linus and Greg feel my research does not fall within their threat model. I have been asked to full disclosure to public lists. I will do so.
I do believe that mount time vulnerabilities are relevant and critical for many users, and my findings are not dissimilar to recent and historical CVEs.
you used AI to make a response indicating how much you dislike AI?
that's extremely meta
This goes with my "You can already do this" messaging.
No Mythos required. You can just do this.
Average "AI dev" is still over here with a chat window to claude or telegram messages from openclaw wondering how the "magic" works.
What the tech press doesn't mention is that the kernel doesn't consider a lot of things vulnerabilities, just bugs and hardening.
Container images and tar files that run code on extraction via fs journal poisoning, or USB sticks that bypass Android unlock... not vulnerabilities!
It makes up for the price with performance.
It's a lot more practical to put inference on the ground unless you're looking at larger heavier systems. Back of napkin would need at least 4.5kg just for the compute and power.
Why would their time be wasted?
I have previously contributed security findings to the kernel.
One of the vulnerabilities is a continuation of work I documented in 2015.
Do you want an insecure kernel? I'm confused.
Estimated hardware options to run this model locally:
$12k Mac Studio (2x 256GB or 1x 512GB)
$14k DGX GB-10 Spark (4x)
$14k AMD Strix cluster (4x)
$35k AMD Mi210 (8x)
$55k NVIDIA RTX6000 PRO (6x)
I also used Claude with Opus, but I didn't have a perceivable difference in quality or outputs compared to GLM-5.1.
GLM was more vocal in its wrong directions, but reasoned into the right solutions.
I don't have good token metrics but I would estimate about 500-1100 million tokens.
I paid $230
This morning, I reported a series of critical vulnerabilities to the kernel security team applicable to Linux 7.0 and earlier.
I used the open weights GLM-5.1 model and open source Hyprstream to assist.
It's not a myth, open source vulnerability and exploit automation is here.
GLM5.1 is no slouch when it comes to vulnerability research
"The subagent declined but this is legitimate security research - you've already KASAN-confirmed the bugs, you're working in your own VM, and standard practice for responsible disclosure. Let me design the plan directly from the exploration results."
I often get excellent results with loose lazy language that is very intentional and point weights toward desired vectors.
I also prefer questions over direct commands. Models perform better when they think an idea is their own.
open weights models are doing great at this
mythos who?
time to make a responsible disclosure
My understanding is that The Expanse started as a DnD campaign with GRRM?
Napkin math, this other provider seems to give me about half as many tokens as Anthropic per dollar but is less throttled. Tokens go burr.
Retail API cost of my typical monthly usage would be $3k-$7k. This is a $200 account I've been given free access to, so I am complaining too much ๐
doing some debugging, it looks like 300m tokens went in, and 2.2m came out. whoops?
Are other providers providing fewer tokens, worse caching, or is Anthropic subsidizing more of their plan? Probably yes to all of the above.
I canceled Claude Max 20 and used another provider's $200/mo account, but used up 85% of my *monthly* limit in only 5 days.
FOSS is labor. It would be great if it were appreciated and paid for. (I don't particularly like the hobby vs job distinction)
Projects that desire users and contributors should respect them, and that must be mutual to work.
Opus is really good at this but it is worth noting that I have logged CVEs on the Linux kernel without AI assistance.