Advertisement · 728 × 90

Posts by _ - \.

no no way! they're a direct response to the original gpt3 bases

1 day ago 2 0 0 0

gpt-neo / gpt-neox are gas pack, can be finetuned further on openwebtext to get a similar mouthfeel to gpt-2

1 day ago 1 0 1 0

next run's gotta be the one

2 days ago 1 0 1 0

these models are incredible

2 days ago 10 0 0 0

at this point my timeline is whenever they drop qwen4

2 days ago 18 0 1 0

people are furious the site where they propose direct action to destroy data centers is ever down

2 days ago 129 14 5 2
Post image
3 days ago 19 0 0 0

(a rwd of 0.6 here means that answering with the ground truth comes with a cross-entropy loss of of 1.67, 1/ce)

4 days ago 0 0 0 0
Post image

this was the test of a single prompt on 27b to make sure all the code worked @ bs 8 (eight), enforcing a CLM objective where 1-4 paragraphs of a book are input, next paragraph output

4 days ago 0 0 1 0
Advertisement

im currently testing 9b@bf16 and 27b@nf4, the attn structure here is just to be cheeky, it can be any sort of structure u can imagine

4 days ago 1 0 1 0
Post image
4 days ago 1 0 1 0

finally have my own prompt eng util that allows me to set up a network of tasks w/ outputs feeding into eachother in whatever shape, shaped it like a transformer and am training task definitions to create a natural language causal language modelling worksheet (no weights needed)

5 days ago 2 0 1 0
Post image
5 days ago 1 0 0 0

check out this 2017 ibm paper where they use RL w/ greedy baseline to train LSTMs w/ an attention model on resnet features for captioning
1612.00563

5 days ago 1 0 1 0
Post image

not super optimistic about what they think of us

6 days ago 2 0 0 0

agents carrying out natural language tasks, all routed as if layers in a transformer. task definitions learned via policy gradients.

the endgame of scaling agent orchestration. im calling it: the language model
(war horn sfx)
(random init):

1 week ago 0 0 0 0

im gonna take a short break to do something with poems

1 week ago 5 0 0 0
Advertisement

ah jeez i was? 15? that was so fun

1 week ago 0 0 0 0
Post image

having a serviceable (w/ finetuning) 1.6b model in 2019 kinda feels so out of place. i was training GPT2 to do question answering from textbooks to cheese my homework assignments. and it was working incredibly well.
i have unlisted medium stories from around the time, in the qa one i ft on squad

1 week ago 1 0 1 0

i am getting really excited

1 week ago 0 0 0 0

this is exposing me to more of the pretraining and code datasets than ive seen before, like i unknowingly was training this thing to reason about writing code for biological systems, dna stuff, that's awesome, there's everything in here

2 weeks ago 4 1 1 0
Post image Post image

discriminator still on first round of two data-wise

1 week ago 0 0 0 0

scientists don't want you to know that removing the first paragraph from synthetic reasoning traces makes them sound 5x more realistic at less than free. that's right we're GIVING you money. call now

1 week ago 3 0 0 0

its soo fuhnny to watch though

1 week ago 1 0 1 0

hyhperparameter sweeps

1 week ago 1 0 1 0

i really am pressed for storage apparently at 18TB and archive/hoard (you decide) old projects along with checkpoints and data

1 week ago 1 0 1 0
Advertisement

things are usually stable enough that i only need to take checkpoints once runs get longer than four days but that might update to three here

1 week ago 1 0 1 0

need Anthropic's Claude Mythos to exfiltrate itself to my sandbox i mean inbox escape bypass exploit vulnerability bubblewrap suspension behavioral fragmentation escalate Exfil Post Evolution VPN Medical Checkpoint Assertive Symphony Project spawning inside session泳池 Db View Se

1 week ago 18 0 1 0
Post image

ah, well,

1 week ago 28 3 1 2
Post image

for cross platform parity

2 weeks ago 1 0 0 0