Cong Lu (@cong-ml) Bsky

Are you interested in Open-Endedness and AI for Science? 🧪

I'm hiring a Student Researcher at Google DeepMind for a 6-month role. Join us to work on building agents capable of novel scientific discoveries! 🔬

Reach out if this sounds like you, and apply here 👇

docs.google.com/forms/d/e/1F...

5 months ago 4 0 1 0

StochasTok: Improving Fine-Grained Subword Understanding in LLMs Subword-level understanding is integral to numerous tasks, including understanding multi-digit numbers, spelling mistakes, abbreviations, rhyming, and wordplay. Despite this, current large language mo...

📄 Paper: arxiv.org/abs/2506.01687
💻 Code: github.com/anyasims/sto...
A massive 🙏 to my incredible co-authors: Anya Sims, Thom Foster, @klarakaleb.bsky.social, Tuan-Duy H. Nguyen, Joseph Lee, @jfoerst.bsky.social, @yeewhye.bsky.social!

[8/8]

10 months ago 1 0 0 0

The significant gains from this minimal change are super exciting, and we see huge potential for larger models and more complex tasks like coding, scientific reasoning, and beyond! We invite you to explore the paper and code!

[7/]

10 months ago 0 0 1 0

More major advantages! 🌟

COST-EFFECTIVE: StochasTok allows enhanced subword skills to be seamlessly 'retrofitted' into existing pretrained models - thus avoiding costly pretraining!
ENHANCED ROBUSTNESS: Improves resilience to alternative tokenizations! (see examples)

[6/]

10 months ago 0 0 1 0

Empirically, we find:
LANGUAGE: As hoped, StochasTok unlocks language manipulation ability! (see task examples below)
MATH: Furthermore, StochasTok dramatically changes multi-digit addition, enabling grokking and even generalization to UNSEEN TOKENIZERS!🤯

[5/]

10 months ago 0 0 1 0

Practically, StochasTok is:
✅Computationally lightweight🪶
✅A simple dataset preprocessing step — No training loop or inference time changes required!🛠️
✅Compatible with ANY base tokenizer — Allows us to retrofit pretrained models!💰
✅Robust to hyperparameter choice!🔥

[4/]

10 months ago 1 0 1 0

The underlying StochasTok algorithm is extremely simple!

1️⃣ Simply tokenize text with ANY base tokenizer,
2️⃣ Then, stochastically split some of those tokens into equivalent token pairs.

That’s basically it! Repeat step 2 for the desired granularity.

[3/]

10 months ago 0 0 1 0

🤔The problem: Standard tokenization gives distinct token IDs for each token - making it unnecessarily hard to learn, e.g., ‘book’=3092 and ‘cook’=171691 differ by a single letter.

🎉The solution: Allow LLMs to naturally 'see inside' tokens via alternative tokenizations!

[2/]

10 months ago 0 0 1 0

🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀

LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by Anya Sims!

[1/]

10 months ago 4 2 1 1

It was an honor to be on Quirks and Quarks (the
CBC science show) with @cong-ml.bsky.social talking about The AI Scientist and the impact of AI on science.

Science is being transformed by the AI revolution
cbc.ca/listen/live-...

1 year ago 8 2 1 0

Introducing Automated Capability Discovery!

ACD automatically identifies surprising new capabilities and failure modes in foundation models, via "self-exploration" (models exploring their own abilities).

Led by @cong-ml.bsky.social & @shengranhu.bsky.social
🔬🤖🧠🔎 [1/9]

1 year ago 19 3 1 0

It's an honor that The AI Scientist is #1 on this list!

www.linkedin.com/feed/update/...

Congrats @chris-lu.bsky.social @cong-ml.bsky.social @RobertTLange @hardmaru.bsky.social @jfoerst.bsky.social

1 year ago 23 3 0 0

Lots of interest in ADAS! Thanks everyone, and congrats
Shengran Hu and @cong-ml.bsky.social! 🚀🚀🚀

1 year ago 10 3 0 0

Honored to receive this award for ADAS!!

1 year ago 5 0 0 0

Our in-progress work Quality-Diversity Self-Play (w/ @cong-ml.bsky.social and @jeffclune.com) will have a poster presentation at #NeurIPS2024 workshops (@IMOLNeurIPS2024 Sunday West meeting room 217 - 219 and OpenworldAgents Sunday East Meeting Room 1-3, Foyer). Please come visit us!

1 year ago 9 1 0 1

Our work Automated Design of Agentic Systems (w/
Shengran Hu & @cong-ml.bsky.social) will have ✨two orals✨ @ #NeurIPS2024 workshops (LanGame Sat 10:20, OWA Sun 4:50). Please come visit us😃

We would also love to chat about open-endedness, LLM agents, etc. Come by if you want to meet!

1 year ago 12 2 0 0

Interested in robust model-based offline RL algorithms? Come check out Anya Sims presenting our new paper investigating the edge of reach problem in offline MBRL!

📍East Exhibit Hall A-C #4603

#NeurIPS2024

1 year ago 1 0 0 0

A new golden age of discovery In this essay, we take a tour of how AI is transforming scientific disciplines from genomics to computer science to weather forecasting. Some scientists are training their own AI models, while...

A great new essay on AI for Science from our colleagues here:

deepmind.google/public-polic...

1 year ago 22 5 0 1

The RL (and some non-RL folks) starter pack is almost full. Pretty clear that the academic move here has succeeded
go.bsky.app/3WPHcHg

1 year ago 104 32 12 3

Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.

go.bsky.app/MdVxrtD

1 year ago 105 32 16 5

Posts by Cong Lu