yes but the thing about scaling laws is much can be validated at small scale first and then you do progressively expensive derisking
and, most importantly, humans have the same bottleneck and we think learn useful things;
Posts by FleetingBits
the loss, downstream benchmarks are all verifiable rewards; not perfect but probably decent proxies
and they can mine their researcher traces for sft data (perhaps?) - although this may be the wrong approach
the models are going to learn things that no individual researcher knows who trains it in their ml data, much like math rl solving unsolved erdos problems
and even setting that aside, labs see researchers transferring knowledge as a harm, not to further reinforce
their internal competition relative to each other is more important to them than the total state of research; if one provides frontier capabilities, they all get the benefit of the open source, but those that did not share also get their own unique products
opus-4.7 feels like a regression in terms of thoroughness of responses for coding / also, it says a lot of things that just seem contradictory or obviously untrue that it reverses on when you point them out
great for discussion of history though
14) as system cards are important for both policy makers and the public to understand the risks of new model releases
13) also importantly, the failure to release a system card for gpt-rosalind is a real omission, which openai should explain
12) anyway, i'm not wedded to this approach, but labs do need to figure out how to release models to the public without certain capabilities; and, monitoring outputs alone may not be enough
11) and some skills, like machine learning research, are retained inside the lab entirely and not provided to customers
10) with frontier labs distilling capabilities from this central model so that dual use capabilities are only available to trusted partners
9) it's also possible the future pipeline looks more like a central general model developed internally and never released
8) but you can imagine a version of claude mythos or gpt-5.4 post-trained on offensive cyber skills and released separately to trusted partners
7) claude mythos was notable because its cyber capabilities were described as emergent; they came from better code understanding and autonomy rather than offensive cyber training
6) it's interesting that openai built a separate bio model rather than train a general model with safeguards removable for trusted users; this may be their approach to cyber capabilities as well
5) i also expect that as dual use capabilities become more extreme, access will narrow to a smaller set of trusted partners, most likely those with significant institutional and, in certain cases, government backing
4) i expect this or some variant to be the future of model releases; dual use capabilities only being made available to trusted partners; we saw an example of this already with claude mythos
3) openai has not published a system card for gpt-rosalind, but i expect it would score higher on biorisk benchmarks than gpt-5.4, given that it is stronger on chemistry and experimental design and analysis
2) it's available through openai's trusted access program; early partners include moderna, amgen and the allen institute
some thoughts on gpt-rosalind
1) gpt-rosalind is a new openai model focused on biology and drug discovery
i think also you have a thing you want to build, which is actually the most important part @eternalism-when.bsky.social is like this too (who i think is another very proficient user of coding agents)
yeah, it seems more that the institutional incentives are more powerful than that of either parents or children as consistent constituents
and, any path to reform involves angering the people that manage the children for 7-8 hours a day
mentat, solve thyself
bsky.app/profile/alas...
i am familiar with the emergent misalignment paper, but training to the model spec does seem to show the models can learn compartmentalized behavior when we want them to / in your example we should start to see issues in other evals
like, in order to get superhuman math skills, we need a verifier for this, it could be lean, it could just be the answer, and so we have to be able to specify the behavior to enough of a degree we care about it, to get it in the first place
in the current regime, it looks like we don’t get those capabilities, without training for them / having a verifier for them; so we have to be able to specify them to some degree before we get it
i don’t really see what you mean by “non-trivial”; we trained models as text completion and then did instruct fine tuning on top; gradient descent is just getting it to learn the target distribution
models don’t blackmail people, though, this never happens in production, only in safety papers; and, when they do, is in some contrived setup designed to make that a plausible thing to do
and i think people should talk in clear terms about the training pipeline when they discuss risks
i think so much of this is just that people can’t break the analogy in their heads of models as humans and / or sci-fi monsters; it’s probably relevant though that you don’t need to know anything about models as mesaoptimizer etc to train models