I think a lot of federal money is tied to accreditation like Pell grants and research funds and stuff. So while Harvard has lots of money in the endowment, it would still be a pretty big hit to the budget.
Posts by David Hall
Many thanks to the Google TPU Research Cloud program for providing the much needed compute for this project, and to all the other great open efforts: @ai2.bsky.social @eleutherai.bsky.social and more!
You can read more in our:
- Website: marin.community
- GitHub: github.com/marin-commun...
- Discord: discord.gg/J9CTk7pqcM
- Documentation: marin.readthedocs.io
- Announcement: marin.community/blog/2025/05/1
Explanation of data shop: prompt or sample data comes in, llm finds more data, train a cheap model to find even more, train, --> llm
Have a specific use case? Come to our Datashop to curate data and train models.
Here’s how we curated more math data:
github.com/marin-commun...
Check out the data:
marin.community/data-browser/
pareto frontier of flops vs bits-per-byte
Have a new algorithm for training? Choose your compute budget and get on the speedrun leaderboard: how fast can you drive down validation loss?
marin.community/speedrun/
Flowchart shoing Github issue (preregistration) -> pull request (experiment.py) -> execution (watch it live) -> WandB report (analysis)
Marin (marin.community) repurposes GitHub, which has been successful for open-source *software*, for AI:
1. Preregister an experiment as a GitHub issue
2. Submit a PR, which implements the experiment in code
3. PR is reviewed by experts in the community
4. Watch the execution of the experiment live!
open weights vs open source (weights + code + recipe) vs open development (+ process, anyone can contribute)
Marin is a new "open lab" for developing foundation models. More than open weights, and even open source, with Marin we're committing to "open development": everything is documented and traceable, and anyone can contribute.
Learn more about the project in Percy's blog post: marin.community/blog/2025/05...
And about the Models we are releasing in @dlwh.bsky.social's training retro: marin.readthedocs.io/en/latest/re...
Super excited Marin is finally out! Come see what we've been building! Code/platform for training fully reproducible models end-to-end, from data to evals. Plus a new high quality 8B base model. Percy did a good job explaining it on the other place. marin.community
x.com/percyliang/s...