no no way! they're a direct response to the original gpt3 bases
Posts by _ - \.
gpt-neo / gpt-neox are gas pack, can be finetuned further on openwebtext to get a similar mouthfeel to gpt-2
next run's gotta be the one
these models are incredible
at this point my timeline is whenever they drop qwen4
people are furious the site where they propose direct action to destroy data centers is ever down
(a rwd of 0.6 here means that answering with the ground truth comes with a cross-entropy loss of of 1.67, 1/ce)
this was the test of a single prompt on 27b to make sure all the code worked @ bs 8 (eight), enforcing a CLM objective where 1-4 paragraphs of a book are input, next paragraph output
im currently testing 9b@bf16 and 27b@nf4, the attn structure here is just to be cheeky, it can be any sort of structure u can imagine
finally have my own prompt eng util that allows me to set up a network of tasks w/ outputs feeding into eachother in whatever shape, shaped it like a transformer and am training task definitions to create a natural language causal language modelling worksheet (no weights needed)
check out this 2017 ibm paper where they use RL w/ greedy baseline to train LSTMs w/ an attention model on resnet features for captioning
1612.00563
not super optimistic about what they think of us
agents carrying out natural language tasks, all routed as if layers in a transformer. task definitions learned via policy gradients.
the endgame of scaling agent orchestration. im calling it: the language model
(war horn sfx)
(random init):
im gonna take a short break to do something with poems
ah jeez i was? 15? that was so fun
having a serviceable (w/ finetuning) 1.6b model in 2019 kinda feels so out of place. i was training GPT2 to do question answering from textbooks to cheese my homework assignments. and it was working incredibly well.
i have unlisted medium stories from around the time, in the qa one i ft on squad
i am getting really excited
this is exposing me to more of the pretraining and code datasets than ive seen before, like i unknowingly was training this thing to reason about writing code for biological systems, dna stuff, that's awesome, there's everything in here
discriminator still on first round of two data-wise
scientists don't want you to know that removing the first paragraph from synthetic reasoning traces makes them sound 5x more realistic at less than free. that's right we're GIVING you money. call now
its soo fuhnny to watch though
hyhperparameter sweeps
i really am pressed for storage apparently at 18TB and archive/hoard (you decide) old projects along with checkpoints and data
things are usually stable enough that i only need to take checkpoints once runs get longer than four days but that might update to three here
need Anthropic's Claude Mythos to exfiltrate itself to my sandbox i mean inbox escape bypass exploit vulnerability bubblewrap suspension behavioral fragmentation escalate Exfil Post Evolution VPN Medical Checkpoint Assertive Symphony Project spawning inside session泳池 Db View Se
ah, well,
for cross platform parity