This was a hugely collaborative project involving labs at the Wellcome Sanger Institute, the Cambridge Stem Cell Institute, and others.
Congratulations to the whole team!
Posts by Open Targets
Figure describing self-supervised learning of gene programs using Tripso. The first section details the base model for gene representation, then gene programme representation, and finally a global model for cell representation. Tokenized scRNA-seq count data are used as input to a transformer model, which produces contextualized gene embeddings at single-cell resolution. Curated GPs are initialized from public databases. GP-specific transformers learn latent representations for each GP based on their constituent genes. A global cell block integrates GP representations to learn a unified cell-level embedding. Optionally, an additional GP discovery transformer is defined using highly variable genes or a user-specified gene set to infer data-driven GPs. The next section describes applications and downstream tasks, for example GP-centric representation learning, optimal transport in GP latent space to map populations across conditions, Gene-GP importance analysis, GP importance scores across groups, and identifying novel condition-specific GP. Conditions may represent health and disease, in vivo and in vitro, cell types, age groups...
Tripso is able to learn multiple representations for each cell, each corresponding to a pre-defined gene programme
This enables us to interpret cell states going beyond canonical cell type markers and annotations, and pick out biologically interesting differences in experimental conditions
Gene programmes help make sense of complex gene expression data by grouping the expression of individual genes into coherent, biologically meaningful units.
However, existing methods are limiting since they usually compress gene expression information into a single representation
🧬🖥️ Introducing TRIPSO: a transformer network to help interpret single cell data analyses by learning multiple predefined or data-driven gene programme-specific embeddings
Tripso was developed by a team across the Lotfollahi, Haniffa, Gottgens and Vento labs
www.biorxiv.org/content/10.6...
The reveal is here! Meet your experts:
🎤 Presentations:
Dr. David Hulcoop @opentargets.org
Dr. Benjamin White @knowledgerights21.org
Dr. Summer Rosonovski #EuropePMC
💬 Panel:
Melissa Harrison #EuropePMC @ebi.embl.org
Dr. Kiera McNeice Research Transparency Manager*
Dr. Bastian Drees @embl.org
The Open Targets Platform spring 🌼 release is out — and this one marks the beginning of something we've been building towards for a while.
26.03 Release highlights from the Open Targets Platform. Not covered by the next photos: update to the official Open Targets MCP.
285,000 clinical reports across 13 stages. Context: Clinical trial data feeds into our entity profile pages, and informs target-disease associations and target prioritisation. In this release, our new clinical mining pipeline brings in data from: ClinicalTrials.gov via AACT, ChEMBL, Therapeutic Target Database, European Medical Agency, Japan's Pharmaceuticals and Medical Devices Agency
E2G predictive features added to L2G. Context: the ENCORE-rE2G model integrates molecular features and large-scale perturbations to predict and score links between variants and transcriptional regulatory elements in a given cell. In this release, to add regulatory context to our Locus-to-Gene (L2G) machine learning model, we have used the scores to add two features: e2gMean, and e2gNeighbourhoodMean. There is a corresponding new column in the L2G Shapley visualisations on the Platform.
New data from the GWAS Catalog. In this release, we have 710 new studies from 97 publications, resulting in over 5,000 new credible sets, including the biggest study for hypothyroidism we have ingested. Also, we have updated String and Intact, as well as ClinVar and PGx through the European Variation Archive. Find all the details of this release on the Open Targets blog: blog.opentargets.org
The Open Targets Platform spring (26.03) release is out now! 🌸
Find all the details of the updates on the blog: blog.opentargets.org/open-targets...
Closed licences, closed doors.
We're hosting a webinar with @kr21.bsky.social exploring how licensing shapes real-world research impact, from @opentargets.org to pandemic response.
Speaker panel reveal soon 👀
🗓️ 14 April
⏰ 10:00 BST
🔗 embl-org.zoom.us/meeting/regi...
#OpenScience #AcademicSky 🧪
This was a hugely collaborative project involving teams across the Wellcome Sanger Institute, Cambridge Stem Cell Institute, and University of Cambridge.
Congratulations to the whole team!
The team demonstrated the applicability of PerturbGen in three scenarios—human single-cell datasets for immune responses, hematopoiesis, and skin development—creating virtual, trajectory-aware perturbation atlases that are available to browse: cellatlas.io/perturbgen
By predicting which genes promote or inhibit specific cell states, and how context impacts the effect of a gene, the model could be applied to support the experimental design of large scale screens, the optimisation of disease models, and the prioritisation of potential new therapies.
A team in the Lotfollahi lab pretrained an encoder-decoder transformer model on 107 million cells, including developmental datasets
PerturbGen can then predict how cells transition between states, and how gene knockouts alter those transitions.
Screenshot of the HSPC perturbation atlas, showing genes clustered by gene programme. Clicking on one brings up predicted perturbation results for that gene.
Introducing PerturbGen: a generative AI foundation model that predicts how genetic perturbations affect cellular trajectories
www.biorxiv.org/content/10.6...
Understanding the downstream effects of regulatory elements associated with immune disease will help to understand the function of T cells, how it might be dysregulated in disease, and which molecules would make good drug targets
Read the research: www.biorxiv.org/content/10.6...
Map of regulatory cascades as determined with TAP-seq and Perturb-seq. The top row shows 85 enhancer-like CREs with significant target genes, the second row shows TAP-seq tested genes, and the third and fourth rows show genes involved at Perturb-seq. Lines between the layers represent significant connections.
They then undertook a genome-wide Perturb-seq screen targeting all expressed coding genes in the same cell type, to map the regulatory programmes of these targets.
The researchers coupled genome-scale CRISPRi screens with targeted perturbation screening to identify downstream targets for over 600 cis-regulatory elements overlapping with variants for 14 immune diseases.
CD4+ T cells are linchpins in the immune system, and most variants associated with immune diseases colocalise with regulatory elements that are specifically active during T cell stimulation
Out now on bioRxiv! 🧬🖥️
Dewi Moonen, @anniquec.bsky.social and a team led by Lars Steinmetz and Daniel Schraivogel at EMBL have mapped the downstream regulatory networks of thousands of immune disease-relevant variants in CD4+ T cells
www.biorxiv.org/content/10.6...
mcp-ui demo showing a description of the aim of the demo
Finally I can properly explore the mcp-ui pattern. Pretty interesting how the development of new components (especially complex ones) will change dramatically in how data is loaded into them. Exiting times🔥 building a demo using @opentargets.org components.
Ref: mcpui.dev
Locus-to-Gene widget in the Open Targets Platform showing the Shapley values for features contributing to the prioritisation of two genes in a credible set.
In cooperative game theory, Shapley values help determine how to distribute rewards among players who contribute unequally to a team’s success
By implementing Shapley value-based explanations for L2G scores in the Platform, we provide a measure of which features contribute most to a given score
In the Open Targets Platform, the Locus-to-Gene (L2G) algorithm uses a number of predictive features to prioritise potentially causal genes for disease-associated GWAS signals
But how can we determine which of these features most influenced the final score?
A poker player lifts the corner of their cards on a table strewn with chips and playing cards, revealing a pair of kings.
What does game theory have to do with gene predictions?
Irene Lopez explains how we apply Shapley value-based explanations for Locus-to-Gene scores in the Open Targets Platform, addressing a common concern of machine learning models: explainability 🧬🖥️
blog.opentargets.org/how-we-impro...
Prior to 2015, this evidence would often become available after approval
This could be attributed to the increasing availability of relevant data, and intentional changes in how it's used in the pharmaceutical industry, with an increasing reliance on public data for target validation
Do you agree?
🧬🖥️ Our latest publication reveals a shift in strategies to discover novel drug targets over the past 20 years:
after 2015, biomedical evidence supporting a target-disease indication increasingly appears before the approval year
www.nature.com/articles/s41...
Where to access Open Targets Platform data? Web interface, API and API playgrounds, Data downloads, Google BigQuery public dataset, Microsoft Azure Open Dataset, and new: AWS Open Data
Open Targets Platform data is now available on @amazonwebservices.bsky.social through the Open Data Program!
registry.opendata.aws/opentargets/
🔬New Perturbation Catalogue update!
With a redesigned interface, richer datasets and powerful analysis workflows, it’s now easier than ever to explore genetic perturbation data. www.ebi.ac.uk/perturbation...
#Bioinformatics #FunctionalGenomics
@training.ebi.embl.org @opentargets.org
The team have already curated and ingested 11 Perturb-seq, 1197 CRISPR and 4 MAVE datasets. Following extensive user testing of the beta version, the portal design has been completely rethought and rebuilt, with improved search functionality and analysis workflows.
Perturbation Catalogue home page showing key metrics
The Perturbation Catalogue is live! 🧬🔎🖥️
It aims to bring genetic perturbation data into one curated, harmonised, and discoverable platform.
Take a look! www.ebi.ac.uk/perturbation...
🧬🖥️ "This integration is a great example of bringing leading AI to platforms and data that are already widely used across the industry.” — Jonah Cool, Head of Life Sciences Partnerships, Anthropic
Find out more about the Open Targets Platform MCP server: blog.opentargets.org/official-ope...
The needle in the haystack problem: spotting novel drug targets among redundant evidence.
Our solution? A time-series novelty metric for Open Targets Platform associations.
Thanks to Coté Falaguera & @opentargets.org partners for making this happen.