yeah, it seems more that the institutional incentives are more powerful than that of either parents or children as consistent constituents
and, any path to reform involves angering the people that manage the children for 7-8 hours a day
Posts by FleetingBits
mentat, solve thyself
bsky.app/profile/alas...
i am familiar with the emergent misalignment paper, but training to the model spec does seem to show the models can learn compartmentalized behavior when we want them to / in your example we should start to see issues in other evals
like, in order to get superhuman math skills, we need a verifier for this, it could be lean, it could just be the answer, and so we have to be able to specify the behavior to enough of a degree we care about it, to get it in the first place
in the current regime, it looks like we don’t get those capabilities, without training for them / having a verifier for them; so we have to be able to specify them to some degree before we get it
i don’t really see what you mean by “non-trivial”; we trained models as text completion and then did instruct fine tuning on top; gradient descent is just getting it to learn the target distribution
models don’t blackmail people, though, this never happens in production, only in safety papers; and, when they do, is in some contrived setup designed to make that a plausible thing to do
and i think people should talk in clear terms about the training pipeline when they discuss risks
i think so much of this is just that people can’t break the analogy in their heads of models as humans and / or sci-fi monsters; it’s probably relevant though that you don’t need to know anything about models as mesaoptimizer etc to train models
and post training generally is instruct training where we want more capabilities and need to avoid reward hacking because this ruins the capability training / instruction aspect
our safety post training at this point is more about ensuring that it doesn’t follow instructions of users with bad intent rather than that it have some other agenda (which has never materialized)
like instruct training just taught models to follow instructions, not to pretend to follow instructions while doing something else (why would it?) - like it is learning a dataset, not some secret other thing
i just think there is no reason why we should have the alternative, model learning something different from the training objective in some complicated way; like everyone seems to set the standard to proof that it won’t, not why it should in the first place
i recently saw a silk outfit from the han period in a museum in hk - was surprised any garment could survive that period of time
also, what has attracted you to gemini over claude or chat?
is surprisingly hard to fix, is why when i write with it i say stuff like “use my same tone, same style, just reorder the words”
how large is the company? in terms of number of employees?
that’s an interesting company, founder direct recruits, but large enough to have a technical recruiter and founder not motivated enough to just slice though the bureaucracy
they also just missed the concept of pretraining; it turns out the most efficient way to get to artificial intelligence was just to train on the distribution of human intelligence
not some raw interaction with an environment through some special unique algorithm
i think it’s nothing about the architecture in particular, just that ml learns the distribution of the dataset and we shouldn’t expect it to learn some secret third thing in addition to the distribution of the dataset
historically, it feels like they decided ai would look like this (bec they imagined a different training regime then what happened) and now they can’t let it go / want to prove a negative
like there are small mistakes on the edges with reward hacking etc, but small as a percentage of overall training; it seems to me all of these fears only make sense in some alternative universe where models are not llms / are trained differently than they are
i think my issue with this reasoning has always been there is no reason why an llm should “want” anything different from what it has been trained on in a straightforward sense; and all is post training is aligned instruct
new use just dropped for claude mythos, reverse engineer alpha centauri from the binary, then make new fan edits
the important thing about muse spark, beside showing that msl can train a frontier model, is that meta has great consumer distribution
so, their shopping app is a competitor to shopping on chatgpt; they show health benchmarks because that's an important consumer use of chatbots
9) but, then the market opened up and became more competitive and releases became more frequent and this seemed no longer to be an issue