Can you train a performant language model using only openly licensed text?
We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
Posts by Lintang Sutawika
Damn, where are these parties iβm missing π
Technically, we do but a lot of that goes paying tuition. Not unlike the 20k for these agents going towards GPU compute π€ͺ
Maybe he thought it was βlocker room talkβ π€ͺ
They're future-proofing the design π
The `decision model n` is being directed by mission control and then forwards a signal to `big data`?? I guess no decision was ever made πππ
Maybe. But probably more likely, they're using QwQ or Deepseek.
Transformers demonstrated how to attend an entire sequence length which at the time was different to many approaches like LSTM that processed tokens sequentially. The attention span across the whole sequence does parallel the aliens from Arrival.
πββοΈ
Attended 2 different lectures (1 class and 1 invited guest lecture) with the similar topic of inference-time scaling. Maybe the matrix is trying to tell me something.
Lectures in #nlp I see that use Taylor Swift to illustrate concepts.
@eleutherai.bsky.social is our official account. Will be posting here and on Twitter from now on.
LTI PhDs seeking refuge in Bluesky
go.bsky.app/NhTwCVb
Hi, I would also like to be included in this list!