Jordan Meyer (@jordanmeyer) Bsky

“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.

1 year ago 248 85 11 19

███░░░░░░░░░ ~25% trained

"A painting of a mountain lake with a boat in the foreground, surrounded by lush green grass, trees, and rocks. The sky is filled with white, fluffy clouds, creating a peaceful atmosphere."

1 year ago 13 3 2 0

Our analysis of the 1st draft of the General-Purpose AI Code of Practice In this blogpost, we highlight some of COMMUNIA's responses to the EU survey on the first draft of the GPAI Code of Practice, as well as some of the concerns expressed by other stakeholders at the mee...

Last week we submitted to the #EU AI Office our comments on the 1st draft of the #AI Code of Practice, focusing on #copyright. ©️

On our blog, Teresa Nobre explains our responses and also the concerns expressed by other stakeholders:
communia-association.org/2024/12/04/o...

1 year ago 3 2 0 0

Great study on misinformation. Just want to point out that this kind of work is impossible without the fair use doctrine. Massive copying, computational analysis, ...

1 year ago 33 12 2 1

Hi, so I've spent the past almost-decade studying research uses of public social media data, like e.g. ML researchers using content from Twitter, Reddit, and Mastodon.

Anyway, buckle up this is about to be a VERY long thread with lots of thoughts and links to papers. 🧵

1 year ago 963 452 59 123

Making a bsky dataset is a bit like breaking glaze. It's in users best interests to know how easy it is, but they'll hate you for it.

1 year ago 2 0 0 0

Sincerely do not tell anyone in the replies what the fire hose is lmao

1 year ago 18 6 3 0

Source.Plus | Print, book-illustration (BM 1... Search, curate, and enrich media collections for AI training using the Source.Plus marketplace. Safe, consenting, high-quality training data. Public domain datasets for model fine-tuning. Source Descr...

Monkeys! 😀source.plus/item/2d8d3be976f5fd753d5...

1 year ago 1 0 1 1

Source.Plus | Grondige Onderrichtinge in de ... Search, curate, and enrich media collections for AI training using the Source.Plus marketplace. Safe, consenting, high-quality training data. Public domain datasets for model fine-tuning. Source Descr...

I found a few fun 'I's using source.plus. You can use the "more like this" panel on the right side to explore others. source.plus/item/8fd37f5...

I hadn't tried this before, thank you for the fun idea!

1 year ago 1 0 2 0

100%. And I think the challenge is real not because it requires complicated technology, but because both AI orgs and rights holders see opt-outs as a compromise that they'd need to be forced into.

1 year ago 2 0 1 0

Posts by Jordan Meyer