Dmitry Penzar (@pensarata) Bsky

New paper showing that much of the apparent success of protein language models in predicting mutational effects is a mirage: These models mostly memorize sites. 1/
www.biorxiv.org/content/10.6...

1 month ago 180 72 6 5

The key ingredient of our solution was MPRA-LegNet, but we also incorporated a large number of new ideas to master the challenge.

It’s inspiring that the second-place team also used LegNet as the basis for their solution.

More details to come

4 months ago 0 0 0 0

Our team achieved first place in the CAGI7 lentiMPRA challenge on predicting the effects of single-nucleotide mutations in regulatory elements, surpassing the nearest competitors by a significant margin.

4 months ago 0 0 1 0

GitHub - autosome-ru/ibis-challenge: Repository with source code and metadata for IBIS challenge Repository with source code and metadata for IBIS challenge - autosome-ru/ibis-challenge

(13/13) In turn, the wider set of data for Final TFs remains suitable for offline benchmarking with the open-source bibis framework (github.com/autosome-ru/...). The whole story can be found on bioRxiv: doi.org/10.1101/2025....

5 months ago 2 0 0 0

IBIS Challenge

(12/13) The online Leaderboard benchmarking platform, including the preprocessed data, benchmarking protocols, and rich documentation, remains fully functional and accessible online (ibis.autosome.org) to facilitate development of the future TFBS models.

5 months ago 1 0 1 0

(11/13) However, those changes did not translate into better prediction of SNP effects. Additionally, pre-initialization of the first convolutional layers with the best available PWMs for the corresponding TFs didn't yield any notable performance gain.

5 months ago 1 0 1 0

(10/13) We conducted ablation studies on LegNet. Minor modifications, such as replacing global average pooling with global max pooling in the SE block, led to substantial performance gains, making the resulting model the best in the post-challenge assessment.

5 months ago 1 0 1 0

(9/13) Post-challenge analysis added extra DL models: top models from the DREAM challenge and popular architectures unused in IBIS, including Malinois and DNA language models. Fine-tuned DNA LMs performed far worse than fully supervised approaches.

5 months ago 1 0 1 0

(8/13) TF-binding models can be used to predict the effect of single-nucleotide variants. In A2G, PWMs performed unexpectedly well, e.g. MEX secured 2nd place. In G2A, the original top triple-A models dominated, followed by MEX and RSAT — the strongest PWM-based approach.

5 months ago 1 0 1 0

(7/13) Yet, several deep learning approaches (DL) failed substantially in cross-experiment validation – in some cases performing far worse than PWMs. Unlocking the full potential of DL clearly requires careful architectural and training design.

5 months ago 2 0 1 0

Ilya Vorontsov on X: "Our paper on LARGE-scale benchmarking of motif discovery tools is published! https://t.co/jIvipjvqxq It was a long, 7 years long journey, which coordinated efforts of 50+ researchers, proud to be on of them. More results from Codebook about poorly studied TFs are coming soon." / X Our paper on LARGE-scale benchmarking of motif discovery tools is published! https://t.co/jIvipjvqxq It was a long, 7 years long journey, which coordinated efforts of 50+ researchers, proud to be on of them. More results from Codebook about poorly studied TFs are coming soon.

(6/13) Performance of the solutions varied substantially across TFs and experimental platforms. The top-scoring ML models outperformed PWM-based IBIS solutions from the competition and our PWM baseline from Codebook MEX (x.com/VorontsovIE/...).

5 months ago 1 0 1 0

(5/13) Once again, we congratulate the runner-up teams (Medici, Salimov & Frolov lab, callitmagic), and the winners (Bench Pressers, mj, and Biology Impostor) (x.com/halfacrocodi...)

5 months ago 1 0 1 0

(4/13) Participants employed a wide range of methods from classic motif discovery with position-specific weight matrices (PWMs) to arbitrary advanced approaches (triple-As), including CNNs, RNNs, gradient boosting, and even more exotic approaches.

5 months ago 1 0 1 0

(3/13) For the first time, the IBIS Challenge assessed in depth the transferability of DNA motif models from artificial to genomic sequences (A2G), and vice versa (G2A), with rigorous test-train splits, multiple performance metrics, and transparent ranking system.

5 months ago 1 0 1 0

Vanja (Ivan Kulakovskiy) on X: "Join the IBIS Challenge: an open competition focused on the computational prediction of transcription factor binding motifs. IBIS aims to advance state-of-the-art methods for Inferring Binding Specificities of human transcription factors from diverse experimental data. (1/12) https://t.co/5DUhweEOy9" / X Join the IBIS Challenge: an open competition focused on the computational prediction of transcription factor binding motifs. IBIS aims to advance state-of-the-art methods for Inferring Binding Specificities of human transcription factors from diverse experimental data. (1/12) https://t.co/5DUhweEOy9

(2/13) TFs orchestrate transcriptional programs by recognizing short DNA motifs. The long-standing goal is to develop reliable models of TFs' DNA binding specificities and avoid biases of particular experimental assays (x.com/halfacrocodi...).

5 months ago 1 0 1 0

(1/13) Excited to share the outcome of the IBIS Challenge! The IBIS challenge united dozens of teams across the world in tackling the problem of modeling transcription factor (TF) binding specificity using a diverse collection of experimental datasets for understudied human TFs.

5 months ago 10 7 1 1

De-novo promoters emerge more readily from random DNA than from genomic DNA Promoters are DNA sequences that help to initiate transcription. Point mutations can create de-novo promoters, which can consequently transcribe inactive genes or create novel transcripts. We know lit...

Excited / nervous to share the “magnum opus” of my postdoc in Andreas Wagner’s lab!

"De-novo promoters emerge more readily from random DNA than from genomic DNA"

This project is the accumulation of 4 years of work, and lays the foundation for my future group. In short, we… (1/4)

7 months ago 170 59 4 1

Design principles of cell-state-specific enhancers in hematopoiesis Screen of minimalistic enhancers in blood progenitor cells demonstrates widespread dual activator-repressor function of transcription factors (TFs) and enables the model-guided design of cell-state-sp...

Out in Cell @cp-cell.bsky.social: Design principles of cell-state-specific enhancers in hematopoiesis
🧬🩸 screen of fully synthetic enhancers in blood progenitors
🤖 AI that creates new cell state specific enhancers
🔍 negative synergies between TFs lead to specificity!
www.cell.com/cell/fulltex...
🧵

11 months ago 142 58 4 9

Large-scale discovery of potent, compact and erythroid specific enhancers for gene therapy vectors - Nature Communications This study presents a large-scale enhancer screening approach to optimize gene therapy vectors. A compact, potent, erythroid-specific enhancer used in a therapeutic vector, improved viral titers, tran...

Finally published! We developed an epigenomics to therapeutics screening approach that identifies naturally occurring elements that can titrate expression of transgenes at various levels including single elements stronger than the B-globin LCR. www.nature.com/articles/s41...

11 months ago 15 3 2 0

Programmatic design and editing of cis-regulatory elements The development of modern genome editing tools has enabled researchers to make such edits with high precision but has left unsolved the problem of designing these edits. As a solution, we propose Ledi...

Our preprint on designing and editing cis-regulatory elements using Ledidi is out! Ledidi turns *any* ML model (or set of models) into a designer of edits to DNA sequences that induce desired characteristics.

Preprint: www.biorxiv.org/content/10.1...
GitHub: github.com/jmschrei/led...

11 months ago 115 37 2 3

We share a lot of our ideas, code, datasets (that we spend years sanitizing) early. Often way before we release preprints. We do this so that others can use, build on, improve & even "beat" our approaches. But I want to say a few things about some simple expectations 1/

1 year ago 90 25 1 5

Modelling and design of transcriptional enhancers - Nature Reviews Bioengineering Enhancers are genomic elements critical for regulating gene expression. In this Review, the authors discuss how sequence-to-function models can be used to unravel the rules underlying enhancer activit...

We wrote a review article on modelling and design of transcriptional enhancers using sequence-to-function models.

From conventional machine learning methods to CNNs and using models as oracles/generative AI for synthetic enhancer design!

@natrevbioeng.bsky.social

www.nature.com/articles/s44...

1 year ago 57 32 1 1

Massively parallel characterization of transcriptional regulatory elements - Nature Lentivirus-based reporter assays for 680,000 regulatory sequences from three cell lines coupled to machine-learning models lead to insights into the grammar of cis-regulatory elements.

Super excited to announce our latest work. On a personal note, it's not an exaggeration to say that blood, sweat, and tears got us to the finish line on this: working w/ an outstanding global team of scientists in Germany, Japan, Russia, and USA responding in >100 pages of complex reviewer comments.

1 year ago 36 10 2 0

EXTRA-seq: a genome-integrated extended massively parallel reporter assay to quantify enhancer-promoter communication Precise control of gene expression is essential for cellular function, but the mechanisms by which enhancers communicate with promoters to coordinate this process are not fully understood. While seque...

Finally out! We present EXTRA-seq, a new EXTended Reporter Assay to quantify endogenous enhancer-promoter communication at kb scale!
www.biorxiv.org/content/10.1...
A 🧵about what it can do:
#SynBio #DeepLearning #GeneRegulation

1 year ago 83 34 5 6

Wonderful.
Just two weeks ago I was explaining to a junior colleague the problem of exaggerated claims in science. This paragraph is exactly what should be printed in place of a user agreement when anybody submits a paper.

1 year ago 3 0 0 0

autosome.org

Join us for our next Kipoi Seminar with with Dmitry Penzar,
@pensarata.bsky.social @ autosome.org!
👉LegNet: parameter-efficient modeling of gene regulatory regions using modern convolutional neural network
📅Wed Dec 4, 5:30pm CET
🧬 kipoi.org/seminar/

1 year ago 3 2 0 0

(1/6) 🐦‍🔥 In IBIS #ibischallenge, we challenged teams from all over the world to decipher the DNA recognition code of human transcription factors. The IBIS Final Conference took place on November 27, 2024. Recordings and slides: disk.yandex.ru/d/82FEnwPn15...

1 year ago 10 5 1 1

Single-cell gene expression prediction from DNA sequence at large contexts Human genetic variants impacting traits such as disease susceptibility frequently act through modulation of gene expression in a highly cell-type-specific manner. Computational models capable of predi...

Maybe I've got your idea wrong but there is a plenty of seq2activity models trained or finetuned using sc data
www.biorxiv.org/content/10.1...

www.biorxiv.org/content/10.1...

www.biorxiv.org/content/10.1...

1 year ago 1 0 1 0

Mapping enhancer-gene regulatory interactions from single-cell data Mapping enhancers and their target genes in specific cell types is crucial for understanding gene regulation and human disease genetics. However, accurately predicting enhancer-gene regulatory interac...

Excited to share our latest preprint on scE2G – a new model to link enhancers to target genes using single-cell data – with state-of-the-art performance across multiple perturbation benchmarks.

biorxiv.org/cgi/content/...

Read more below!

1/12

1 year ago 41 20 1 4

Posts by Dmitry Penzar