Ollie Liu (@oliu-io) Bsky

Thanks to my amazing collaborators: @samsja19.bsky.social , Johannes Hagemann, @shangshang-wang.bsky.social , Jason Wiemels, Jeff Kaufman, and @willieneis.bsky.social
Special shout out to the Nucleic Acid Observatory for the sequencing data, and @PrimeIntellect for compute support.

1 year ago 1 0 0 0

We’re sharing METAGENE-1’s:
📄Paper: metagene.ai/metagene-1-p...
🌐Website: metagene.ai
🤗Model weights: huggingface.co/metagene-ai
🧵7/

1 year ago 5 3 1 0

🛡Tailored for detection, not design. We scoped METAGENE-1 to minimize risks while maximizing potential for public health and biosurveillance. Responsible open-sourcing matters. With open weights, we aim to drive progress in interpretability and safe genomics research.
🧵6/

1 year ago 3 0 1 0

📈METAGENE-1 achieves state-of-the-art results in:
- Pathogen detection
- Genomic embedding benchmarks
- Generalization to multi-species tasks
It already shows promise in public health and biosurveillance, and we are collaborating with experts to unlock its full impact.
🧵5/

1 year ago 5 0 1 1

The METAGENE-1 model is 7B parameter Llama-style transformer 🦙, pretrained and optimized for anomaly detection, embedding, and multi-species genomics. Fully compatible with 🤗Hugging Face (huggingface.co/metagene-ai) – ready to use like any of your favorite LLMs!
🧵4/

1 year ago 2 0 1 0

📊The data behind METAGENE-1:
- Brand-new dataset collected with experts from Southern California & Missouri
- 1.5 trillion base pairs from diverse wastewater samples
- Short reads (100–300 BPs), deep sequencing at scale
- Byte-Pair Encoding customized for genomic sequences
🧵3/

1 year ago 2 1 1 0

Why is METAGENE-1 special? 🤔We trained it on wastewater metagenomics, capturing the human-adjacent microbiome across the US for the past 12 months. This unlocks powerful capabilities for early pathogen detection and microbial ecosystems understanding. 🌱🦠
🌐Website: metagene.ai
🧵2/

1 year ago 2 0 1 0

Introducing METAGENE-1🧬, an open-source 7B-parameter metagenomics foundation model pretrained on 1.5 trillion base pairs. Built for pandemic monitoring, pathogen detection, and biosurveillance, with SOTA results across many genomics tasks.
🧵1/

1 year ago 27 6 2 0

Landed at Vancouver to attend #NeurIPS :-) Excited to chat about multimodal models, AI4Science, decision making, and more!

1 year ago 15 0 0 0

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

1 year ago 104 22 4 4

👋 nlp@usc student. thanks!

1 year ago 3 0 0 0

tfw you realize that this isn't an alt twitter for academic posting but an alt insta for cute doggos.

this is doodle, our border collie pup that often used as adversarial attacks for image classification models (they classify him as corgi :-)

1 year ago 13 0 1 0

yes please if there's still space left :-P

1 year ago 1 0 0 0

our border collie pup doodle absolutely wants nothing from that plate of banana :-P

1 year ago 5 0 1 0

Posts by Ollie Liu