FleetingBits (@fleetingbits) Bsky

yeah, it seems more that the institutional incentives are more powerful than that of either parents or children as consistent constituents

and, any path to reform involves angering the people that manage the children for 7-8 hours a day

3 hours ago 4 0 0 0

mentat, solve thyself

bsky.app/profile/alas...

17 hours ago 3 0 1 0

i am familiar with the emergent misalignment paper, but training to the model spec does seem to show the models can learn compartmentalized behavior when we want them to / in your example we should start to see issues in other evals

23 hours ago 1 0 0 0

like, in order to get superhuman math skills, we need a verifier for this, it could be lean, it could just be the answer, and so we have to be able to specify the behavior to enough of a degree we care about it, to get it in the first place

1 day ago 1 0 1 0

in the current regime, it looks like we don’t get those capabilities, without training for them / having a verifier for them; so we have to be able to specify them to some degree before we get it

1 day ago 1 0 1 0

i don’t really see what you mean by “non-trivial”; we trained models as text completion and then did instruct fine tuning on top; gradient descent is just getting it to learn the target distribution

1 day ago 2 0 0 0

models don’t blackmail people, though, this never happens in production, only in safety papers; and, when they do, is in some contrived setup designed to make that a plausible thing to do

1 day ago 3 0 3 1

and i think people should talk in clear terms about the training pipeline when they discuss risks

1 day ago 2 0 1 0

i think so much of this is just that people can’t break the analogy in their heads of models as humans and / or sci-fi monsters; it’s probably relevant though that you don’t need to know anything about models as mesaoptimizer etc to train models

1 day ago 2 0 1 0

and post training generally is instruct training where we want more capabilities and need to avoid reward hacking because this ruins the capability training / instruction aspect

1 day ago 3 0 1 0

our safety post training at this point is more about ensuring that it doesn’t follow instructions of users with bad intent rather than that it have some other agenda (which has never materialized)

1 day ago 2 0 1 0

like instruct training just taught models to follow instructions, not to pretend to follow instructions while doing something else (why would it?) - like it is learning a dataset, not some secret other thing

1 day ago 2 0 2 0

i just think there is no reason why we should have the alternative, model learning something different from the training objective in some complicated way; like everyone seems to set the standard to proof that it won’t, not why it should in the first place

1 day ago 2 0 1 1

i recently saw a silk outfit from the han period in a museum in hk - was surprised any garment could survive that period of time

2 days ago 5 0 0 0

also, what has attracted you to gemini over claude or chat?

2 days ago 3 0 0 0

is surprisingly hard to fix, is why when i write with it i say stuff like “use my same tone, same style, just reorder the words”

2 days ago 2 0 1 0

how large is the company? in terms of number of employees?

2 days ago 6 0 0 0

that’s an interesting company, founder direct recruits, but large enough to have a technical recruiter and founder not motivated enough to just slice though the bureaucracy

2 days ago 8 0 2 0

4 days ago 9 1 0 0

4 days ago 14 0 0 1

5 days ago 25 1 3 0

they also just missed the concept of pretraining; it turns out the most efficient way to get to artificial intelligence was just to train on the distribution of human intelligence

not some raw interaction with an environment through some special unique algorithm

6 days ago 8 1 1 0

The “it” in AI models is the dataset. – Non_Interactive – Software & ML What this manifests as is – trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point. Sufficiently large diffusion conv-unets produce the same images as ViT generators. AR sampling produces the same images as diffusion.

i like this description nonint.com/2023/06/10/t...

6 days ago 7 1 1 0

i think it’s nothing about the architecture in particular, just that ml learns the distribution of the dataset and we shouldn’t expect it to learn some secret third thing in addition to the distribution of the dataset

6 days ago 10 0 3 1

historically, it feels like they decided ai would look like this (bec they imagined a different training regime then what happened) and now they can’t let it go / want to prove a negative

6 days ago 13 0 3 1

like there are small mistakes on the edges with reward hacking etc, but small as a percentage of overall training; it seems to me all of these fears only make sense in some alternative universe where models are not llms / are trained differently than they are

6 days ago 12 0 1 0

i think my issue with this reasoning has always been there is no reason why an llm should “want” anything different from what it has been trained on in a straightforward sense; and all is post training is aligned instruct

6 days ago 32 1 4 1

new use just dropped for claude mythos, reverse engineer alpha centauri from the binary, then make new fan edits

1 week ago 14 0 1 3

the important thing about muse spark, beside showing that msl can train a frontier model, is that meta has great consumer distribution

so, their shopping app is a competitor to shopping on chatgpt; they show health benchmarks because that's an important consumer use of chatbots

1 week ago 6 0 0 0

9) but, then the market opened up and became more competitive and releases became more frequent and this seemed no longer to be an issue

1 week ago 10 0 0 0

Posts by FleetingBits