FleetingBits (@fleetingbits) Bsky

yes but the thing about scaling laws is much can be validated at small scale first and then you do progressively expensive derisking

and, most importantly, humans have the same bottleneck and we think learn useful things;

2 days ago 0 0 1 0

the loss, downstream benchmarks are all verifiable rewards; not perfect but probably decent proxies

and they can mine their researcher traces for sft data (perhaps?) - although this may be the wrong approach

2 days ago 0 0 1 0

the models are going to learn things that no individual researcher knows who trains it in their ml data, much like math rl solving unsolved erdos problems

and even setting that aside, labs see researchers transferring knowledge as a harm, not to further reinforce

2 days ago 1 0 1 0

their internal competition relative to each other is more important to them than the total state of research; if one provides frontier capabilities, they all get the benefit of the open source, but those that did not share also get their own unique products

2 days ago 1 0 1 0

opus-4.7 feels like a regression in terms of thoroughness of responses for coding / also, it says a lot of things that just seem contradictory or obviously untrue that it reverses on when you point them out

great for discussion of history though

3 days ago 11 1 1 0

14) as system cards are important for both policy makers and the public to understand the risks of new model releases

3 days ago 3 0 0 0

13) also importantly, the failure to release a system card for gpt-rosalind is a real omission, which openai should explain

3 days ago 3 0 1 0

12) anyway, i'm not wedded to this approach, but labs do need to figure out how to release models to the public without certain capabilities; and, monitoring outputs alone may not be enough

3 days ago 2 0 1 0

11) and some skills, like machine learning research, are retained inside the lab entirely and not provided to customers

3 days ago 2 0 2 0

10) with frontier labs distilling capabilities from this central model so that dual use capabilities are only available to trusted partners

3 days ago 3 0 1 0

9) it's also possible the future pipeline looks more like a central general model developed internally and never released

3 days ago 2 0 1 0

8) but you can imagine a version of claude mythos or gpt-5.4 post-trained on offensive cyber skills and released separately to trusted partners

3 days ago 3 0 2 0

7) claude mythos was notable because its cyber capabilities were described as emergent; they came from better code understanding and autonomy rather than offensive cyber training

3 days ago 2 0 1 0

6) it's interesting that openai built a separate bio model rather than train a general model with safeguards removable for trusted users; this may be their approach to cyber capabilities as well

3 days ago 2 0 1 0

5) i also expect that as dual use capabilities become more extreme, access will narrow to a smaller set of trusted partners, most likely those with significant institutional and, in certain cases, government backing

3 days ago 2 0 1 0

4) i expect this or some variant to be the future of model releases; dual use capabilities only being made available to trusted partners; we saw an example of this already with claude mythos

3 days ago 4 0 1 0

3) openai has not published a system card for gpt-rosalind, but i expect it would score higher on biorisk benchmarks than gpt-5.4, given that it is stronger on chemistry and experimental design and analysis

3 days ago 3 0 1 0

2) it's available through openai's trusted access program; early partners include moderna, amgen and the allen institute

3 days ago 3 0 1 0

some thoughts on gpt-rosalind

1) gpt-rosalind is a new openai model focused on biology and drug discovery

3 days ago 10 1 1 0

i think also you have a thing you want to build, which is actually the most important part @eternalism-when.bsky.social is like this too (who i think is another very proficient user of coding agents)

4 days ago 13 1 2 1

4 days ago 1 0 1 0

yeah, it seems more that the institutional incentives are more powerful than that of either parents or children as consistent constituents

and, any path to reform involves angering the people that manage the children for 7-8 hours a day

4 days ago 6 0 0 0

mentat, solve thyself

bsky.app/profile/alas...

5 days ago 3 0 1 0

i am familiar with the emergent misalignment paper, but training to the model spec does seem to show the models can learn compartmentalized behavior when we want them to / in your example we should start to see issues in other evals

5 days ago 1 0 0 0

like, in order to get superhuman math skills, we need a verifier for this, it could be lean, it could just be the answer, and so we have to be able to specify the behavior to enough of a degree we care about it, to get it in the first place

5 days ago 1 0 1 0

in the current regime, it looks like we don’t get those capabilities, without training for them / having a verifier for them; so we have to be able to specify them to some degree before we get it

5 days ago 1 0 1 0

i don’t really see what you mean by “non-trivial”; we trained models as text completion and then did instruct fine tuning on top; gradient descent is just getting it to learn the target distribution

5 days ago 2 0 0 0

models don’t blackmail people, though, this never happens in production, only in safety papers; and, when they do, is in some contrived setup designed to make that a plausible thing to do

5 days ago 3 0 3 1

and i think people should talk in clear terms about the training pipeline when they discuss risks

5 days ago 2 0 1 0

i think so much of this is just that people can’t break the analogy in their heads of models as humans and / or sci-fi monsters; it’s probably relevant though that you don’t need to know anything about models as mesaoptimizer etc to train models

5 days ago 2 0 1 0

Posts by FleetingBits