Advertisement · 728 × 90

Posts by Willie Neiswanger

Hubble Suite logo (cloth patch with names of key organizations involved: USC, MPI, NVIDIA)

Hubble Suite logo (cloth patch with names of key organizations involved: USC, MPI, NVIDIA)

Announcing 🔭Hubble, a suite of open-source LLMs to advance the study of memorization!

Pretrained 1B/8B param models, with controlled insertion of texts designed to emulate key memorization risks: copyright (e.g., book passages), privacy (e.g., synthetic biographies), and test set contamination

5 months ago 8 4 1 3
Post image

😃 Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA!

[1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧵

11 months ago 8 3 1 0
Post image

🔍 Diving deep into LLM reasoning?

From OpenAI's o-series to DeepSeek R1, from post-training to test-time compute — we break it down into structured spreadsheets. 🧵

1 year ago 5 2 1 0

Added! (bsky.app/profile/will...)

1 year ago 1 0 0 0

Our paper also contains an in-depth discussion on safety when releasing metagenomic models.

Looking for collaborators to build on this with us — please reach out!

metagene.ai

1 year ago 6 0 0 0

We leverage the ecosystem of modern LLM tooling—in tokenization, model architecture, training, infra, etc—for performance and extensibility. METAGENE-1 is standardized & easy to use.

Hugging Face: huggingface.co/metagene-ai
Github: github.com/metagene-ai

1 year ago 6 0 1 0
A subset of results on our Genomic Embedding Benchmark and Pathogen Detection Benchmark.

A subset of results on our Genomic Embedding Benchmark and Pathogen Detection Benchmark.

​​METAGENE-1 shows state-of-the-art results on pathogen detection, metagenomic embedding, and other genomic tasks.

We also release new benchmarks for genomic detection and embedding (eg, Gene-MTEB, based on MTEB for LLMs).

See our paper for details: arxiv.org/abs/2501.02045

1 year ago 4 0 1 0
Advertisement
Overview of the metagenomic data collection and sequencing pipeline for model pretraining.

Overview of the metagenomic data collection and sequencing pipeline for model pretraining.

Our data pipeline is: human microbiome > wastewater > metagenomic sequences > tokens > training data.

Wastewater provides a rich source of data from tens of thousands of species across the human-adjacent microbiome. In total we pretrain on over 1.5T base pairs of DNA/RNA.

1 year ago 1 0 1 0
Overview of METAGENE-1 and applications.

Overview of METAGENE-1 and applications.

Metagenomic sequencing of wastewater produces vast amounts of data that can capture public health trends at a societal scale. Our goal is to train a model on this data to help in large-scale wastewater monitoring & detection of novel bio threats.

1 year ago 1 0 1 0
Preview
Metagenomic Foundation Model Metagenomic Foundation Model for Pandemic Monitoring

Excited to release METAGENE-1, a 7B parameter metagenomic foundation model, built to aid in pathogen detection & pandemic monitoring. Pretrained on 1.5 trillion base pairs of DNA/RNA sequenced from wastewater.

A collab w/ USC, PrimeIntellect, & the Nucleic Acid Observatory.

metagene.ai

1 year ago 21 1 1 0
Video

Entropy is one of those formulas that many of us learn, swallow whole, and even use regularly without really understanding.

(E.g., where does that “log” come from? Are there other possible formulas?)

Yet there's an intuitive & almost inevitable way to arrive at this expression.

1 year ago 543 128 22 12

Added!

1 year ago 0 0 0 0

Added! (bsky.app/profile/will...)

1 year ago 1 0 1 0

Added! (bsky.app/profile/will...)

1 year ago 1 0 0 0
Video

hi everyone!! let's try this optimal transport again 🙃

1 year ago 329 31 2 0
Advertisement
DNA Break Repair by Homologous Recombination (2024) Drew Berry wehi.tv
DNA Break Repair by Homologous Recombination (2024) Drew Berry wehi.tv YouTube video by WEHImovies

Delighted to publish my new molecular animation:

DNA Break Repair by Homologous Recombination

youtu.be/Xe-83tBcxhs

1 year ago 273 114 38 41

Added! (bsky.app/profile/will...)

1 year ago 1 0 0 0

Added!

1 year ago 0 0 1 0

Added! (bsky.app/profile/will...)

1 year ago 0 0 0 0

Added!

1 year ago 1 0 0 0

Added! (bsky.app/profile/will...)

1 year ago 1 0 0 0

Added!

1 year ago 0 0 0 0
Advertisement

Added!

1 year ago 2 0 0 0

Added!

1 year ago 0 0 0 0

Added! (bsky.app/profile/will...)

1 year ago 1 0 0 0

Added!

1 year ago 1 0 0 0

Added!

1 year ago 1 0 0 0
Video

Anne Gagneux, Ségolène Martin, @quentinbertrand.bsky.social Remi Emonet and I wrote a tutorial blog post on flow matching: dl.heeere.com/conditional-... with lots of illustrations and intuition!

We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423

1 year ago 356 102 12 11

Added!

1 year ago 0 0 0 0

Added! (bsky.app/profile/will...)

1 year ago 1 0 0 0
Advertisement