Bin Shao (@binshaophy) Bsky

We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)

6 months ago 174 91 4 5

Big congrats, Yunha!

11 months ago 1 0 0 0

Interesting work on plasmid engineering.

1 year ago 5 0 0 0

All NIH study sections canceled indefinitely. This will halt science and devastate research budgets in universities.

1 year ago 12244 4979 586 1165

This gives me such hope for biodiversity conservation, mammals and future mammalogists! Go young people!! 🧪

1 year ago 180 37 3 0

Population-level amplification of gene regulation by programmable gene transfer - Nature Chemical Biology Gene regulation in engineered microbial populations is often tuned at individual cell levels. Now, a population-wide amplification system has been devised that expands the dynamic range of plasmid tra...

A new paper from Lingchong You's group develops a cool amplification circuit that expands the dynamic range of plasmid transfer #ChemBio #synbio #microsky

www.nature.com/articles/s41...

1 year ago 16 6 0 0

Recruiting PhD students: our research covers language model + genomics + systems biology: scholar.google.com/citations?us...
1. Four-year PhD program in Beijing
2. Master's degree required
3. Start date: Sep 2025
Please DM if you are interested.

1 year ago 1 0 0 0

Two decades of bacterial ecology and evolution in a freshwater lake Nature Microbiology - A 471-metagenome time series from Lake Mendota in Wisconsin, USA, reveals seasonal and decadal shifts in bacterial functional and ecological dynamics, especially in response...

After 24 years of work, I’m thrilled to announce the TYMEFLIES dataset, which comprises metagenomes from Lake Mendota (Madison, WI), collected roughly every 10 days (471 samples) for 20 years! @quendi.bsky.social @robinrohwer.bsky.social

rdcu.be/d5put

A thread…

1 year ago 245 102 3 3

We deeply appreciate the experimental studies that have made this work possible! Please check our github for more details: github.com/lingxusb/TXp...

1 year ago 0 0 0 0

Light-dependent modulation of protein localization and function in living bacteria cells - Nature Communications Bacterial proteins are often recruited to specific subcellular locations to carry out their functions. Here, the authors use the optogenetic CRY2-CIB1 system to re-direct proteins to different subcell...

www.nature.com/articles/s41...

1 year ago 4 3 0 0

Google Colab

We hope this work will be a useful tool. Feedback is welcome! Please feel free to try our Colab notebook to predict transcriptomes at (almost) zero cost! It takes about 20 minutes for a genome with 4k genes: colab.research.google.com/drive/1Kd-QI...

1 year ago 0 0 1 0

TXpredict captures variations in gene expression both across different protein functional groups and within the same functional group.

1 year ago 0 0 1 0

We further used TXpredict to predict the expression of 3.1M genes across a collection of 900 microbial genomes. Small clusters of ribosomal genes located at the periphery of the tSNE plot of all genes and showed high predicted expressions.

1 year ago 1 0 1 0

Our model leverages information learned from ESM2 model and basic protein statistics to predict genome-wide gene expression. It achieves an average Spearman correlation of 0.53 in predicting gene expression for bacterial genomes that are not in the training dataset:

1 year ago 1 0 1 0

Predicting microbial transcriptome using genome sequence We present TXpredict, a transformer-based framework for predicting microbial transcriptomes using annotated genome sequences. By leveraging information learned from a large protein language model, TXp...

Is it possible to get the transcriptome of any sequenced microbe without doing the experiments? Happy to introduce TXpredict, a transcriptome prediction tool that generalizes to novel microbial genomes: www.biorxiv.org/content/10.1...

1 year ago 7 3 1 0

Predicting microbial transcriptome using genome sequence www.biorxiv.org/content/10.1101/2024.12....

1 year ago 3 1 0 0

GitHub - lingxusb/EcoVAE Contribute to lingxusb/EcoVAE development by creating an account on GitHub.

9/n We envision EcoVAE will advance biodiversity investigations, especially in under-sampled regions and ultimately support global biodiversity monitoring efforts🙏

💻Codes are publicly available: github.com/lingxusb/Eco...

1 year ago 0 0 0 0

8/n 🧩 EcoVAE can also interpolate missing occurrences. For example: In North America, EcoVAE predictions for Sassafras largely overlapped with iNaturalist records. In South Asia, EcoVAE highlighted a wider distribution of Desmodium, consistent with field surveys.

1 year ago 0 0 1 0

7/n 🌍Where is biodiversity under-sampled? We found that regions with high prediction error overlap with known "darkspots" of biodiversity collection. For example, the highest prediction errors for plants were observed in South Asia, Southeast Asia, the Middle East, and Central Africa.

1 year ago 0 0 1 0

6/n 🦋EcoVAE isn’t limited to plants. The model generalizes well to other taxa, including butterflies and mammals, showcasing its versatility across ecosystems.

1 year ago 0 0 1 0

5/n🖥️Remarkably, EcoVAE can predict species distributions even with sparse inputs. With just 20% of input data, it achieved an AUROC of 0.78, effectively identifying the locations of missing genera.

1 year ago 0 0 1 0

4/n🌍 We withheld data from three independent regions to test its generalization. The model reconstructed species distributions effectively—even for withheld test regions—and predicted the location of missing records at genus and species levels.

1 year ago 0 0 1 0

3/n 🚀We leverage a VAE structure that enables fast and scalable modeling of species distribution patterns. In training, we masked 50% of species records and tasked the model to reconstruct full species distribution, mimicking real-world biodiversity sampling

1 year ago 0 0 1 0

2/n 🌿Biodiversity is under immense pressure. Predicting global species distributions at scale is critical, but traditional species distribution models struggle with massive datasets and interspecies interactions (e.g., >33M records and >127K species of plants)

1 year ago 0 0 1 0

A generative deep learning approach for global species distribution prediction Anthropogenic pressures on biodiversity necessitate efficient and highly scalable methods to predict global species distributions. Current species distribution models (SDMs) face limitations with larg...

🌏What happens when generative AI meets ecology? How can we use AI to advance biodiversity exploration and monitoring?

Excited to introduce EcoVAE, a generative approach trained on over 100 million high-quality vouchered records to model global biodiversity

www.biorxiv.org/content/10.1...
1/n🧵

1 year ago 1 0 1 0

Preprint alert! A thread is coming soon.

1 year ago 0 0 0 0

book cover and first page of the preface

The third edition of my textbook, Nonlinear Dynamics and Chaos, was published today. You can preview the first 68 pages on Google Books, or take a look at the preface below to see what's new. The main new thing is a chapter on the Kuramoto model! Hope you enjoy it.

2 years ago 172 30 6 7

Two BioML starter packs now:

Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc

DM if you want to be included (or nominate people who should be!)

1 year ago 119 56 10 11

Posts by Bin Shao