🌟 Applications for the 2026 Leena Peltonen School of Human Genetics are open!
Back after a great 2025 edition: ~20 global leaders and ~20 PhD students shaping the future of genomics.
📅 July 26–30, 2026
📍 Wellcome Genome Campus, UK
📝 Apply by March 6 → www.lpshg.com
Posts by Pia Rautenstrauch
Thrilled to announce that I’ve just opened my research lab at @helmholtz-hiri.bsky.social !
A huge thank you to my mentor Roi Avraham (and lab members!) for the incredible training and support that made this possible.
There is the obvious fraud potential, but I think something much more fundamental is going to happen, too. It will change the meaning of what a "research publication" is. The reason why people read papers is to learn something that they couldn't reproduce in seconds by pushing a few buttons.
How many high-impact developmental variants are we missing by relying only on adult splicing annotations?
We address this in our preprint “Aberrant splicing prediction during human organ development”: www.biorxiv.org/content/10.1...
I'm excited to announce we're recruiting a PhD student in Machine Learning for Immunology within the Einstein Center for Early Disease Interception, together with Simon Haas!
🧠 The Lipid #Brain Atlas is out now! If you think #lipids are boring and membranes are all the same, prepare to be surprised. Led by @lucafusarbassini.bsky.social with Giovanni D'Angelo's lab, we mapped membrane lipids in the mouse brain at high resolution.
www.biorxiv.org/cgi/content/...
We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)
I did not know Taylor Swift was moonlighting in soliciting contributions for fake journals!
Check out my talented colleagues' study, profiling hundreds of CRISPRa-responsive regulatory elements surrounding PHOX2B, a key player in neuroblastoma, using a targeted scRNA-seq screen in a neuroblastoma cell line.
Meeting agenda Sep 10, 2025 Attendees: Links: Agenda (feel free to add your items): • Blog almost ready for R blogger linkage thanks to @Izabela Mamede, @Mengyuan Shen and @Maria Doyle • New posts from many including @Juan Henao and myself • Ideas for other posts? • There is tidybulk v2 ready to be submitted. Some feedback would be nice there. • Stefano's new speedy code in tidySE • https://github.com/tidyomics/ genomics-todos/issues/19#is suecomment-3239791713 • https://github.com/tidyomics/t idySummarizedExperiment/i ssues/106 • Report back from tidyomics workshop at useR! (Justin and Mike) • Other projects in the works? • Ideas for engaging new users? New developers?
These are the corresponding times for your meeting: Location Local Time Durham (USA - North Carolina) Wednesday, September 10, 2025 at 6:00:00 am Adelaide (Australia - South Australia) Wednesday, September 10, 2025 at 7:30:00 pm Paris (France - Paris) | Wednesday, September 10, 2025 at 12:00:00 noon Corresponding UTC (GMT) Wednesday, September 10, 2025 at 10:00:00
Our first Fall #tidyomics meeting will be this Wed 10 September, early in US / noon in Europe / late in Australia. Feel free to join if you're interested in what we are doing to make omics data more amenable to tidy data analysis.
Organized with Stefano @stemang.bsky.social
L’effet Matilda n’est pas une fiction.
Il est inscrit dans l’histoire scientifique.
Il a éclipsé des femmes comme Marthe Gautier, née il y a cent ans, pionnière oubliée de la trisomie 21.
➡️ https://l.franceculture.fr/1LI
Are electronic health records (EHR) more predictive of disease onset than polygenic scores? Can we transfer EHR-based prediction models between countries? Our study on these questions using 3 biobank-based studies with N>845K, is out in @natgenet.nature.com today:
www.nature.com/articles/s41...
The participants of Dagstuhl Seminar 24122 standing on steps outside (from https://www.dagstuhl.de/24122)
Multiple types of embeddings (UMAP, t-SNE, Laplacian Eigenmaps, PHATE, PCA, MDS) of Wikipedia text data labelled by a text summaries generated by an LLM. Methods like UMAP and t-SNE show cluster structure that reflect shared subject matter in text, whiel other methods show more continuous structure.
Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of primate brain organoids at different time periods. Different methods highlight different aspects of development, such as clusters of similar cell types or time courses of cell development.
Multiple embedding methods (PCA, Laplacian Eigenmaps, t-SNE, MDS, PHATE, UMAP) of 1000 Genomes Project genotypes. Different methods reflect different aspects of demographic history of populations.
Last year I met a bunch of great researchers who work with high-dimensional data at a Dagstuhl seminar. This week we put out a preprint about the history and philosophy of low-dimensional embedding methods, their applications, their challenges, and their possible future arxiv.org/abs/2508.15929
We spent a year writing this review of low-dim embeddings and arguing about things like epistemic roles and best practices :-) 20+ authors are all participants of the Dagstuhl seminar we held last year: www.dagstuhl.de/24122. Led by @alexandr.bsky.social and Cyril de Bodt.
arxiv.org/abs/2508.15929
We're committed to support as many attendees as possible join us at #scverse2025 - feel free to reach out if you have questions!
Antibodies are highly diverse, but most possible sequences are unstable or polyreactive. In this work, just published in Cell Syst., we propose a new source of data for modeling constraints from these properties. Our models show clear improvements in predicting Ab dysfunction. (1/n)
t.co/qCZERPUMPF
Thanks, @paubadiam.bsky.social! That makes sense. Excited for the results 🔎.
Very well set up benchmark and informative comparisons! I might have missed it, but did you also compare the performance of the same methods using either truly paired vs synthetically paired multimodal data as input in terms of your performance evaluation metrics, in addition to network consistency?
By now, I’ve heard from many people who’ve noticed inconsistencies when using silhouette-based metrics for horizontal data integration evaluation. I hope we’ve helped shed light on why these metrics fall short and that our recommendations prove useful to you!
Excited to share our latest paper @natmethods.nature.com
We present a high-throughput framework to map cellular interactions at ultra-high scale – broadly applicable from whole-organism immune response mapping to personalized therapy response prediction (1/4).
www.nature.com/articles/s41...
This preprint from Helen Sakharova is one of the coolest things to come out of my lab: “Protein language models reveal evolutionary constraints on synonymous codon choice.” Codon choice is a big puzzle in how information is encoded in genomes, and we have a new angle. www.biorxiv.org/content/10.1...
Lucky to have inspiring and supportive mentors by my side! @mikelove.bsky.social
Evaluating something like batch correction requires looking at the data, and picking metrics that capture what you care about. Great work @prauten.bsky.social and @uweohler.bsky.social
Shortcomings of silhouette in single-cell integration benchmarking - @uweohler.bsky.social @prauten.bsky.social @mdc-bimsb.bsky.social @mdc-berlin.bsky.social @humboldtuni.bsky.social go.nature.com/4fcQzZr
Truly grateful for the exceptional opportunity to participate in #LPSHG2025 last week, featuring a stellar ✨ lineup of leading researchers who doubled as tutors, alongside inspiring fellow PhD students. Excited to apply my learnings and see where this collaborative spirit takes genomics next!
*Easter egg alert* NOT in the published paper. We also benchmarked Evo 2 and while it did better than other gLMs (consistent that scale can improve gLMs), it still falls short of a basic CNN trained using one-hot sequences and far short of supervised SOTA
The deadline for the VIB.AI group leader positions is approaching - send in your CV and short research plan before 14th June to start your BioML research lab in Leuven or Ghent
Excited to share my first contribution here at Illumina! We developed PromoterAI, a deep neural network that accurately identifies non-coding promoter variants that disrupt gene expression.🧵 (1/)
We finally concluded the meeting. Thanks to all attendees for their scientific contributions and for traveling (near or far) to the meeting! Thanks to the local organizers for the infrastructure and catering, and thanks to the co-organizers @yaronorenstein.bsky.social @camillemrcht.bsky.social!