as grim as it is I am kind of hoping for like. a full blown emotional breakdown in public
Posts by aria
i guess air travel is the thing that *would* make Newspaper People concerned but it honestly seems kind of like small potatoes given that we are in a state where the only question seems to be "is this going to be worse than COVID economically or slightly less bad"
a graph from Weights&Biases showing two noisy curves on the same graph. the two lines are around the same but begin diverging near the end of the graph
I think I might be a bad scientist
I am in the nasty habit of running my experiment before my baseline, so whenever I start the baseline run I spend the whole day rooting for the gap between the grey line and the brown one to get wider
throughout The Sopranos I found it pretty notable that all of these guys are constantly getting shot at and going to jail for a pretty mediocre income
everyone below Tony seems to live a basically unremarkable middle class lifestyle and Tony himself manages only "dentist rich"
i mean there's a fundamental tradeoff. compute is not unlimited and, worse, you can only speed up training so much by adding more. mythos is a bet specifically that relatively modest RL on a much more capable base is worth it
ah, sorry, was looking at it backwards. point stands though
this entire thing, besides the weird nothing interviews, is also basically a retread of a series of Barely Sociable videos from 2020 which present all the same evidence
not that it really matters but honestly I'd bet that they never mined any part of the strait. much easier to just say you did
after all, however confident I am, I would not bet my life on it
source of this could be queuing or it could just be an abnormally high batch size for whatever preview the ran on to limit the size of the pre-release deployment
to be clear:
1. this is a "grapevine" kind of rumor so I don't have specifics
2. the way I have seen it phrased implies that they are talking about full response time
yeah, this is true, although it seems by comparing Ant's internal benchmark set to the actual ECI scores on previous models that their internal one is somewhat easier, so you can probably treat that value as an upper bound to its real ECI score
based on anthropic's general research direction/philsophy I think it's just a relatively-dense 10T-20T trained similarly to Opus 4.6
according to the system card it actually uses fewer reasoning tokens on average than Opus 4.6
I don't think it's like that, I think it's just a very slow model with CoT reasoning
if the rumor is correct it appears to be functionally equivalent to the GPT-5.x Pro series, although scaled on the axis of model size rather than reasoning depth
outside of SWE it ranks similarly to GPT-5.4 Pro on general benchmark sets
you'd notice because it would take 15-25 minutes to respond, reportedly
fwiw, besides the obvious SWE-maxxing RL, it seems the actual baseline intelligence capacity of Claude Mythos is not that much higher than GPT-5.4 Pro (probably <3T), it's just that they don't really evaluate the Pro models on agentic stuff because that's unhinged
my understanding is that if there were an exact copy of the solar system 10ly away we would have to get pretty lucky with angles to even notice Earth was there
our technology makes our sampling of exoplanets pretty biased against planets like Earth around sun-like stars, inconveniently
worth noting that a lot of thinking on the Fermi paradox assumed the intensity of our signal cast into space would increase over time
it has decreased iirc. even peak Earth radio noise would not be detectable to someone with 2026 tech from 10ly - dubious if Earth itself would even be noticeable
prior to the upcoming energy crisis the biggest recent rise in cost of living for urban America can be attributed to an decades-long ideologically-driven freeze on housing construction
Oops, All Ideology!
hard to buy into the profit maximizer narrative when the last few years have been characterized by a fit of ideological madness among the US elite running completely counter to their financial interests
many events competing for the headline today but we all know the real groundbreaking news is Minecraft switching to Vulkan
not excited for the "urban americans aren't real americans" consensus to fuck me over by focusing on individual commutes instead of logistics, which is by far my main exposure to this besides non-fuel petrochem
they will not be releasing mythos-preview publicly
see I only know of like two people meeting this description. is it more common than that
competition math**
I'll be the first to admit: not uniformly, yet. comparable/better for a lot of general agentic tasks and competition but polishing our SWE post-training stack is one of our big focuses for the next model we release
base model is 100% capable of it imo, it's just a matter of data and environments
I do like running my own AI at home, but it's worth acknowledging that my box pulls >100W idle, fights my air conditioner, and has a dismal PUE of probably <~30%, compared to >90% for a datacenter. you should use datacenters more and self-hosting less if you care about energy usage.
i have worked here for 9 months and don't actually know if this is true
in my unbiased opinion arcee is downright great
everyone quoting this with approval is either named something like "fizzy the tiefling prince ๐ acab blm ๐๏ธ COMMS OPEN" or "anticiv anarcho-revanchist ๐ป๐ช" and they're all the most unbearable kind of person you will ever meet