“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.
Posts by Jordan Meyer
███░░░░░░░░░ ~25% trained
"A painting of a mountain lake with a boat in the foreground, surrounded by lush green grass, trees, and rocks. The sky is filled with white, fluffy clouds, creating a peaceful atmosphere."
Last week we submitted to the #EU AI Office our comments on the 1st draft of the #AI Code of Practice, focusing on #copyright. ©️
On our blog, Teresa Nobre explains our responses and also the concerns expressed by other stakeholders:
communia-association.org/2024/12/04/o...
Great study on misinformation. Just want to point out that this kind of work is impossible without the fair use doctrine. Massive copying, computational analysis, ...
Hi, so I've spent the past almost-decade studying research uses of public social media data, like e.g. ML researchers using content from Twitter, Reddit, and Mastodon.
Anyway, buckle up this is about to be a VERY long thread with lots of thoughts and links to papers. 🧵
Making a bsky dataset is a bit like breaking glaze. It's in users best interests to know how easy it is, but they'll hate you for it.
Sincerely do not tell anyone in the replies what the fire hose is lmao
Monkeys! 😀source.plus/item/2d8d3be976f5fd753d5...
I found a few fun 'I's using source.plus. You can use the "more like this" panel on the right side to explore others. source.plus/item/8fd37f5...
I hadn't tried this before, thank you for the fun idea!
100%. And I think the challenge is real not because it requires complicated technology, but because both AI orgs and rights holders see opt-outs as a compromise that they'd need to be forced into.