Really amazing work and a big effort by Olga, Alex, Julie, Koen and many others. Check it out for sure!
Posts by Niklas Kempynck
New preprint @cxqiu.bsky.social @jshendure.bsky.social ! Can we learn regulatory grammars of human cell types — by training on mouse development and transferring across 241 mammalian genomes? Introducing STEAM & a whole-organism scATAC-seq atlas from E10 to birth.
www.biorxiv.org/content/10.6...
Fun fact: CREsted is named after the great crested newt, which has a crested back resembling scATAC peaks. This was inspired by the (alpine) newts I occasionally encounter in my parents' garden 🤗
... and to @steinaerts.bsky.social for his guidance throughout the project.
This work was done together with @seppedewinter.bsky.social, and we’d like to thank @casblaauw.bsky.social, @lukasmahieu.bsky.social, Vasilis, @erencaneksi.bsky.social, @samdieltiens.bsky.social, @darinaabaffy.bsky.social and all the amazing co-authors for their help...
Compared to the preprint we added robustness analyses and more benchmarking of options within CREsted and of CREsted features (like motif identification) to traditional methods. We also aimed to position it well in the landscape of sequence-based modeling methods.
CREsted is finally out! You can find the article, together with a summarizing Research Briefing, in thread. 🦎
Big thanks to Nelson & Trygve for guiding me, and to all the other people in the group for the nice collab. Thanks to @steinaerts.bsky.social for supporting me on this endeavor and to @fwovlaanderen.bsky.social for funding it. Also, Pacific Northwest nature is quite insane 😁
These studies have many more interesting analyses, so would highly recommend to check out these big efforts from all the people involved! It was great to work together with all the people in Trygve’s group, on our shared interest of trying to understand gene regulation in the brain.
Finally, in a study led by Yuanyuan & Nelson we dove deep into astrocytes subgroups in the BG, and pushed CREsted models to their resolution limit to learn how these subgroups differ in enhancer logic. A very fun adventure with great data and many modalities, and a nice set of enhancer tools.
Next, another big atlas release led by Nelson and Yuanyuan on the primate basal ganglia (BG), where again we described enhancer codes of the strongly conserved groups across species and checked how well the models could predict enhancer tool function.
First, a study led by @mtvector.bsky.social and Nelson generated a cross-species multiome atlas of the spinal cord, where we described enhancer codes of identified groups with strong conservation across species. We used our models to study enhancer tools for targeting specific cell types.
Last summer I spent 4 months working at the @alleninstitute.org as a Visiting Scientist. Recently we released some preprints about the work we collaborated on, where from new multiome atlases of CNS regions we tried to decipher underlying enhancer logic with CREsted (among many other things). (1/n)
Paper alert! 💻 How many cells do you need to train reliable deep learning models in regulatory genomics? We asked how data quality, sequencing depth, and dataset size affect training of sequence-to-function models from scATAC-seq. Out now www.nature.com/articles/s41...
(details below)
We are thrilled to share our new pre-print: “System-wide extraction of cis-regulatory rules from sequence-to-function models in human neural development”. S2F-deeplearning models can accurately encode enhancers, yet decoding these models into human-interpretable rules remains a major challenge.
Relieved to finally post my whole developing brain evolutionary "theory of everything" preprint!
www.biorxiv.org/content/10.1...
We have two open positions for a ML and a LLM engineer to launch a machine learning expertise unit in our center @vibai.bsky.social, see vib.ai/en/opportuni...
We will have our next community meeting on Tuesday, 2025-09-16 at 18:00 CEST! Niklas Kempynck will be presenting on CREsted, a package for training enhancer models on scATAC-seq data.
(Zoom registration link and more information in thread!)
🧵
I wrote a quick application note on Tomtom-lite, a Python implementation of the Tomtom algorithm for comparing PWMs against each other. This implementation can be 10-1000x faster and, as a Python function, can be integrated into your workflows easier.
www.biorxiv.org/content/10.1...
One thousand candidate enhancers tested in vivo in the mouse brain! A massive resource and oh so useful as validation set for genome-wide enhancer prediction methods. Super fun to be involved in one of the papers: ‘the prediction challenge paper’ by Nelson&Niklas et al www.cell.com/cell-genomic...
Make sure to also check out the other studies part of the larger effort on identifying and validating enhancer tools.
This study was done together with Nelson Johansen and supervised by Trygve Bakken at the @alleninstitute.org. Thanks to all co-authors for the great inter-lab collaboration! Also a personal shoutout to the members in @steinaerts.bsky.social lab for a nice team effort and to Stein for guidance.
Check out our work on evaluating methods for predicting in vivo cell enhancer activity in the mouse cortex! Combined, scATAC peak specificity and sequence-based CREsted predictions gave the best predictive performance, aiming to advance genetic tool design for cell targeting in the brain.
Calling someone bird-brained is, in fact, a way of calling someone highly intelligent. @yaseminsaplakoglu.bsky.social reports: www.quantamagazine.org/intelligence...
Very proud of two new preprints from the lab:
1) CREsted: to train sequence-to-function deep learning models on scATAC-seq atlases, and use them to decipher enhancer logic and design synthetic enhancers. This has been a wonderful lab-wide collaborative effort. www.biorxiv.org/content/10.1...
Also check out Hannah’s thread on our latest preprint on HyDrop v2, an open-source platform for scATAC-sequencing, and a great, cost-efficient way of generating data for S2F models. 🙌
CREsted is available at github.com/aertslab/CRE.... Analysis notebooks can be found at github.com/aertslab/CRE.... All models developed for this preprint and in previous work are available in CREsted through crested.get_model(). We look forward to your feedback!
This was a big collaborative effort, together with @seppedewinter.bsky.social , and with great contributions from @casblaauw.bsky.social , Vasilis and many others. A special shoutout to @lukasmahieu.bsky.social who professionalized the package, and to @steinaerts.bsky.social for supervising.
Finally, we train a model on a full-development zebrafish scATAC-seq atlas, and use it to design and in vivo validate cell type- and timepoint-specific enhancers with a high success rate. We also attempt to modulate reporter strength over two cell types.
In a new functionality to CREsted, we explore Borzoi fine-tuning to mouse motor cortex scATAC-seq data. We show that fine-tuned models and smaller models from scratch have a near-identical performance.