Yes my prediction is entirely testable and I would be very happy to be wrong. My bias is the obvious one: I want to use the best model.
Posts by David Crawshaw
Very true. A better way I could have made my point would be: the right time to panic about LLM security was a year ago.
On distillation: I am not aware of a way to stop it other than playing whack-a-mole with accounts. Though if there is a good defense against it I should not be informed.
But given that fact it is not obvious to me that restricting the tools is best, other than as a start of a restricted-release business model. Well-funded attackers have Opus giving them the bugs to write exploits. Who gets more value today out of Mythos, attack or defense? I would claim defense.
I did read the report, what I took away from it is Mythos is very good at generating exploits for bugs that mostly Opus could find. (If you think I am misreading it I would like to hear, I often find the numbers in these styles of reports hard to interpret.)
The idea that big tech companies will inherently have better security tools than me because of enterprise access deals makes me sad.
1-2 in code I didn’t write depending how you count, but in private code bases. The real shock was a week ago when I tried to make a change to how exe.dev keys work. I asked Opus for a security review and it found something I hadn’t thought of which led to a chain reaction of 5 more horrible bugs.
In code as I write it? Perhaps a dozen.
By the time you and I can use Mythos, there will be a new top-end rev that is enterprise only. That treadmill helps keep the enterprise dollars flowing (which is most of the dollars) by relegating distillation companies to second rank.
Prediction: Glasswing is mostly misdirection. I am sure it is better than and thus finds more security bugs than Opus. But Opus was the change in kind.
This is marketing cover for fact that top-end models are now gated by enterprise agreements and no longer available to small labs to distill.
exe.dev region selector
new exe.dev feature
I'm not particularly happy about being wrong about. But there is also only so much effort I can put into each of these little messages, so I am going to be wrong on occasion. I'll do my best to minimize it.
Turns out I was wrong on the history here and both people involved in markdown had a CS background. I suppose this is evidence that even those versed in expertise can put it aside when reality comes knocking.
The beauty of markdown is that it is useful. But it hides secondary lesson: experts are not always the best people to solve problems. Anyone with a strong CS background would never have designed markdown, because it cannot be parsed cleanly as a context-free grammar.
This approach to UI design sounds like a nice theory that would not work in practice. Turns out it does work, and once you commit, it is has a clarifying effect across your entire API. blog.exe.dev/a-simple-ui-...
I’m generally curious about this, mostly from the domain auth side because I’d like people to be able to bring their own domain so their VMs can be vm7.you.dev, etc. What part of atproto would you like to see?
OH: localstorage is this generation’s longjmp
I want to hear about all of your personal software factories. What are you doing that's working? We went around the table at @exe.dev on Tuesday all of our workflows are so different. blog.exe.dev/bones-of-the...
“You can install a GitHub App without authorizing the app. Similarly, you can authorize the app without installing the app.” In completely unrelated news, you can now connect GitHub repos granularly to your @exe.dev VMs, without generating (and exposing to LLMs) personal access tokens.
To: me
Subject: something or other
Body: No action required. Blah blah blah.
This didn't even need to be an email.
New feature: the pi coding agent is included by default. It is hooked up to the LLM gateway included in your subscription.
can I fix it? no. an upstream network is to blame and I do not have a way to route around them. so I turn one of the stacks off, because first your network has to work.
hours of testing: yes that site has poor behavior on one its network stacks. not packet loss, but high variance in packet rtt, making interactive sessions painful.
"networking is too easy, let's run two network stacks everywhere, with different performance characteristics"
- dual stack enthusiasts
I know you don't want to self-host, but I can think of somewhere this would be fun to try and run. 🤔
Just moved all exe.dev VMs behind a new global load balancer and anycast network. All our qualification testing shows much better routing locality. If you have a VM, please give it a go and let me know how it goes!
This is not progress.
Whoever decided to not include ifconfig and netstat in the default Ubuntu server images: why?
we can produce SOC2 type 1 for business customers.
we have implemented Google OIDC and Okta support in the teams plan (actually Google OIDC works for anyone logging in with a gmail address) and teams has an admin/user role distinction. more features planned!
You mean you want exe.dev launched in a region on a cloud provider? We are investigating this for business customers. If you are thinking of this in a business context, I would love to chat!