Advertisement ยท 728 ร— 90

Posts by hailey schoelkopf

so academic twitter is like actually-actually migrating this time huh?

i still donโ€™t know if i have it in me to actively use another social network yet ๐Ÿ˜–

1 year ago 48 0 7 0

thank you for the kind words!! :)

1 year ago 2 0 0 0
Post image

introducing the new Vacuum Use (beta)

1 year ago 1 0 0 0

๐Ÿ‘‹

2 years ago 1 0 1 0
Preview
Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining We released Dolma, OLMoโ€™s pretraining dataset. Dolma open dataset of 3 trillion tokens. Available on HuggingFace under the ImpACT license

We released Dolma, the dataset for OLMo, AI2's LLM. It's 3+ trillion tokens. We hope it will help w study of language models!

Available on HuggingFace w/ ImpACT license huggingface.co/datasets/allenai/dolma

Overview+datasheet blog.allenai.org/dolma-3-trillion-tokens-open-llm-corpus-9a0ff4b8da64

2 years ago 23 10 1 1