Advertisement Β· 728 Γ— 90

Posts by Jeremie Kalfon πŸ‘¨β€πŸ’»πŸ§¬πŸ€–πŸš€

Post image

🚨 2026 Lilly x Nucleate Grand Challenge: Aging Reimagined

πŸ”¬ $100K non-dilutive
πŸ›οΈ Pitch at Lilly HQ
🀝 Lilly's science + venture teams

Focus: mobility, cognition, immune resilience, regenerative medicine β€” the frontiers of healthspan.

πŸ“… May 15
πŸ‘‰ linktr.ee/lillygrand...

1 day ago 0 0 0 0
The scPRINT-2 Corpus | JΓ©rΓ©mie 350 Million Cells, 16 Organisms: Building the Largest Single-Cell Training Dataset


The full dataset is open, including an interactive atlas you can explore right now online. We also released the data pipeline so you can reproduce or extend it.

I just wrote a blog post about it: πŸ“– www.jkobject.com/pro...
3/3

5 days ago 1 0 0 0

350 million cells. 16 eukaryotic organisms β€” from human to mouse to tomato plant. 25 TB of unique data. 337 cell types, 296 diseases, spanning almost every major tissue.

Ontology-aligned gene names, cell types, tissues, and species, consistently across the entire corpus.
2/3

5 days ago 1 0 1 0
scPRINT-2: Towards the next-generation of cell foundation models and benchmarks | bioRxiv bioRxiv - the preprint server for biology, operated by openRxiv, a nonprofit organization dedicated to advancing scientific communication

The biggest bottleneck in building cell foundation models isn't the architecture. It's the data.

For scPRINT-2 we assembled what is, to our knowledge, the largest pre-training corpus for any cell foundation model. www.biorxiv.org/cont... 🧡
1/3

5 days ago 0 0 1 0
Post image

Instead of attending to every pair of positions, you make two lightweight passes β€” one along rows, one along columns.

I Just wrote up a small blogpost about it: πŸ“– jkobject.com/criss-c...
Would love to hear from anyone exploring efficient attention mechanisms πŸ™‚
#Transformers #Attention
2/2

1 week ago 0 0 0 0

Self-attention changed everything in deep learning. But it comes with a tax: O(nΒ²) complexity. For long sequences, that's not just slow β€” it's a wall.

There's a cleaner way to think about it, which I introduced in my recent preprint: scPRINT-2, it is called Criss-Cross Attention: 🧡
1/2

1 week ago 0 0 1 0

We prefer some people to get cancer, MS, parkinson than to give a virus to people that will get the virus anyway. Many people might volunteer! Indeed, we give so much to associations against cancer, MS, and dementia, but when it comes time to do something about it, it seems no one wants to.
6/6

2 months ago 0 0 0 0

← Nowadays, no regulatory agency, even less in Europe, would let you do that.
5/6

2 months ago 0 0 1 0

You could recruit kids, give them the vaccine, and infect them with EBV directly, since you know that almost all of them will be at some point, then check if they get infected or not using sequencing (PCR tests, B-cell antigen sequencing), and accept this as the endpoint of the trial.
4/6

2 months ago 1 0 1 0

Because you need to recruit tens of thousands of young kids, test them often for infection, and wait decades to see symptoms of other diseases appear in some of them. The statistics are terrible.

But it could be cheap...
3/6

2 months ago 0 0 1 0
Advertisement

From different cancers, skin diseases, dementia, parkinson and more.
The reason why there is no vaccine yet in 2026 is very interesting.
Basically, it is super expensive. But why is it so?
2/6

2 months ago 0 0 1 0

Did you know that likely most cases of multiple sclerosis (MS) are driven by the EBV virus (herpes/mononucleosis disease)?

>90% of us get infected in our teens, and some will go on to develop many diseases later in life because of it.
1/6

2 months ago 1 0 1 0
Post image

And then lucky to pursue through an atlas of cells of many types and species, with a focus on quality and diversity mattering more than quantity with @jkobject.com

2 months ago 0 1 0 0

@cantinilab.bsky.social

4 months ago 0 0 0 0
Preview
scPRINT-2: Towards the next-generation of cell foundation models and benchmarks Cell biology has been booming with foundation models trained on large single-cell RNA-seq databases, but benchmarks and capabilities remain unclear. We propose an additive benchmark across a gymnasium of tasks to discover which features improve performance. From these findings, we present scPRINT-2, a single-cell Foundation Model pre-trained across 350 million cells and 16 organisms. Our contributions in pre-training tasks, tokenization, and losses made scPRINT-2 state-of-the-art in expression denoising, cell embedding, and cell type prediction. Furthermore, with our cell-level architecture, scPRINT-2 becomes generative, as demonstrated by our expression imputation and counterfactual reasoning results. Finally, thanks to our pre-training database, we uncover generalization to unseen modalities and organisms. These studies, together with improved abilities in gene embeddings and gene network inference, place scPRINT-2 as a next-generation cell foundation model. ### Competing Interest Statement The authors have declared no competing interest.

Paper: www.biorxiv.org/cont... β€’ Code: github.com/cantinila...

Curious: **what’s the one benchmark you wish every single-cell foundation model reported by default?**
6/6

4 months ago 1 0 0 0

4. **Generalization:** evaluation on **unseen organisms, tasks, and modalities.** It is also a push to rethink some evaluation of scFM; **SOTA on many tasks**. πŸ₯‡Β πŸƒΒ β›·οΈΒ β›ΉοΈβ€β™€οΈ

🎁 If you’re reading papers over the break, I hope this is useful.
5/6

4 months ago 0 0 1 0

3. **Data + pipeline:** unified **scBaseCount + Tahoe-100M + CELLxGENE**, with consistent preprocessing + weighted random sampling ****(and other practical bits that usually stay hidden) β†’ **350M cells, 16 species, ~300 tissues, ~500 cell types**. 🌍🫁🐭
4/6

4 months ago 0 0 1 0

1. **Benchmark:** **42 components** of scFMs across a gymnasium of tasks; looking at dataset size, encoding, training, architectures, losses, etc. πŸ“Š

2. **Model:** **scPRINT-2** β€” *small but mighty* with **~20M active parameters**, built from the strongest ingredients we found. πŸ€–πŸ§¬
3/6

4 months ago 0 0 1 0

After a few years building scFMs (scPRINT, Xpressor, scPRINT-2…), I wanted to do something more β€œcomplete” than just shipping a new model: understand what matters, train the best version we can, and stress-test generalization properly.

So this work is a 4-in-1 release:
2/6

4 months ago 0 0 1 0

πŸ§‘β€πŸŽ„πŸŽ„ Christmas Foundation Model Release: scPRINT-2

**One-liner:** a **20M-active-param** single-cell foundation model trained on **350M cells / 16 species / 300 tissues / 500 cell types**.
1/6

4 months ago 3 1 2 0
Advertisement

Thanks to Future4Care, TimothΓ© Cynober, Whitelab Genomics, and Scienta Lab for organizing the event, and to Matteo Marengo, Clara Brouaux, and Gabriel Michaux for helping me manage the round table.

And thanks to my all-star panel: Yann Fleureau, Jeremy Besnard, Sofia Dahoune, and Steven Jerome

4 months ago 0 0 0 0
Post image

It was a blast hosting our Nucleate Inside AI roundtable at the France Techbio 2025 event.

4 months ago 1 0 1 0
Preview
TechBio France 2025 Join TechBio France 2025 to shape the future of France's TechBio ecosystem, fostering innovation and collaboration in biotech and technology

Join us Friday the 4th at the πŸ‡«πŸ‡· France TechBio2025 event!!
www.eventbrite.fr/e/...
3/3

5 months ago 0 0 0 0

Without double-talk and with amazing panelistsπŸ§‘β€πŸ”¬:

- Yann Fleureau, CEO, Blossom Life Sci & Founder of Cardiologs
- Steven Jerome, Director, Lead of Hit Discovery, SchrΓΆdinger
- JΓ©rΓ©my Besnard, Advisor, InFocusTx & Co-founder of Exsciencia
- Sofia Dahoune, Partner at Daphni
2/3

5 months ago 1 0 1 0

🌐🧬I am excited to present you a round table I am doing together with Matteo Marengo Gabriel Michaux as part of our emerging Nucleate Parisian chapter led by Clara Brouaux πŸ”₯.

Title: **Inside AI: Choosing the Right Path to Value Creation**
1/3

5 months ago 1 1 1 0
Open Conference of AI Agents for Science: 2025 The 1st Open Conference of AI Agents for Science (agents4science 2025). AI serves as both primary authors and reviewers of research papers.

Next week we will see the first conference where both the main authors and reviewers are LLM Agents!

This might be fun to follow: agents4science.stanf...
πŸ‘€Β πŸ€–

6 months ago 0 0 0 0

I am presenting my PhD work today at the conference on immuno oncology in Toulouse's CRCT Oncopole!

Happy to talk about how we can use foundation models in the real world πŸ§¬Β πŸ§‘β€βš•οΈ

6 months ago 1 0 0 0
Preview
LinkedIn This link will take you to a page that’s not on LinkedIn

πŸ‘‰Learn more & apply:

6 months ago 0 0 0 0

πŸ”· Alnylam BioVenture Challenge β€” one day at Alnylam HQ, one shot at $100K in non-dilutive funding. Apply by Oct 17.

And β€” we’re also recruiting the next generation of Nucleate Leaders. If you’re ready to build biotech and strengthen the community behind it, apply today.

6 months ago 1 0 1 0
Advertisement

It’s about growth, collaboration, and the chance to give back by lifting others.
Two flagship opportunities are now open:

πŸ”· Activator 2026
β€” our equity-free accelerator equipping scientific founders with the tools to launch biotech ventures. Apply by Oct 20.

6 months ago 1 0 1 0