Mozhdeh Gheini (@mgheini) Bsky

I must also add that I’m assuming there’s no breakthrough architecture/pre-training/post-training method that pushes us to start everything from scratch. I’m simply asking about the decision factors in greenlighting such a full restart in the current status quo.

1 year ago 1 0 0 0

Are there any good pointers on when/why one would decide to run pre-training from scratch (and follow it with post-training ofc) to create a fresh LLM? Is it simply about shifting the knowledge cutoff or more than that? Do we know how/if that happens nowadays? What are the deciding factors?

1 year ago 0 0 1 0

i was annoyed at having many chrome tabs with PDF papers having uninformative titles, so i created a small chrome extension to fix it.

i'm using it for a while now, works well.

today i put it on github. enjoy.

github.com/yoavg/pdf-ta...

1 year ago 98 22 5 1

Given how bad I am at it, it’s out of my league too; still fun though 😅

1 year ago 1 0 0 0

Were you doing the NYT’s crossword? That’s how it happened for me. Also, if you want a bonus one, “doe” :)

1 year ago 1 0 1 0

f’ as in fine-tuned from f, not the derivative of f 😅

1 year ago 1 0 0 0

I got confused there yoo. Maybe something like “further condition the model’s output” (instead of update the model)?
So if the model is f(x), before the dashed line it’s f’(x), and after that it’s f(x|prompt/context).

1 year ago 2 0 1 0

USC NLP folks are on Bluesky!
Follow my amazing colleagues here

go.bsky.app/KUwSZ6W

1 year ago 17 5 3 2

Posts by Mozhdeh Gheini