Are you interested in Open-Endedness and AI for Science? 🧪
I'm hiring a Student Researcher at Google DeepMind for a 6-month role. Join us to work on building agents capable of novel scientific discoveries! 🔬
Reach out if this sounds like you, and apply here 👇
docs.google.com/forms/d/e/1F...
Posts by Cong Lu
📄 Paper: arxiv.org/abs/2506.01687
💻 Code: github.com/anyasims/sto...
A massive 🙏 to my incredible co-authors: Anya Sims, Thom Foster, @klarakaleb.bsky.social, Tuan-Duy H. Nguyen, Joseph Lee, @jfoerst.bsky.social, @yeewhye.bsky.social!
[8/8]
The significant gains from this minimal change are super exciting, and we see huge potential for larger models and more complex tasks like coding, scientific reasoning, and beyond! We invite you to explore the paper and code!
[7/]
More major advantages! 🌟
COST-EFFECTIVE: StochasTok allows enhanced subword skills to be seamlessly 'retrofitted' into existing pretrained models - thus avoiding costly pretraining!
ENHANCED ROBUSTNESS: Improves resilience to alternative tokenizations! (see examples)
[6/]
Empirically, we find:
LANGUAGE: As hoped, StochasTok unlocks language manipulation ability! (see task examples below)
MATH: Furthermore, StochasTok dramatically changes multi-digit addition, enabling grokking and even generalization to UNSEEN TOKENIZERS!🤯
[5/]
Practically, StochasTok is:
✅Computationally lightweight🪶
✅A simple dataset preprocessing step — No training loop or inference time changes required!🛠️
✅Compatible with ANY base tokenizer — Allows us to retrofit pretrained models!💰
✅Robust to hyperparameter choice!🔥
[4/]
The underlying StochasTok algorithm is extremely simple!
1️⃣ Simply tokenize text with ANY base tokenizer,
2️⃣ Then, stochastically split some of those tokens into equivalent token pairs.
That’s basically it! Repeat step 2 for the desired granularity.
[3/]
🤔The problem: Standard tokenization gives distinct token IDs for each token - making it unnecessarily hard to learn, e.g., ‘book’=3092 and ‘cook’=171691 differ by a single letter.
🎉The solution: Allow LLMs to naturally 'see inside' tokens via alternative tokenizations!
[2/]
🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀
LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by Anya Sims!
[1/]
It was an honor to be on Quirks and Quarks (the
CBC science show) with @cong-ml.bsky.social talking about The AI Scientist and the impact of AI on science.
Science is being transformed by the AI revolution
cbc.ca/listen/live-...
Introducing Automated Capability Discovery!
ACD automatically identifies surprising new capabilities and failure modes in foundation models, via "self-exploration" (models exploring their own abilities).
Led by @cong-ml.bsky.social & @shengranhu.bsky.social
🔬🤖🧠🔎 [1/9]
It's an honor that The AI Scientist is #1 on this list!
www.linkedin.com/feed/update/...
Congrats @chris-lu.bsky.social @cong-ml.bsky.social @RobertTLange @hardmaru.bsky.social @jfoerst.bsky.social
Lots of interest in ADAS! Thanks everyone, and congrats
Shengran Hu and @cong-ml.bsky.social! 🚀🚀🚀
Honored to receive this award for ADAS!!
Our in-progress work Quality-Diversity Self-Play (w/ @cong-ml.bsky.social and @jeffclune.com) will have a poster presentation at #NeurIPS2024 workshops (@IMOLNeurIPS2024 Sunday West meeting room 217 - 219 and OpenworldAgents Sunday East Meeting Room 1-3, Foyer). Please come visit us!
Our work Automated Design of Agentic Systems (w/
Shengran Hu & @cong-ml.bsky.social) will have ✨two orals✨ @ #NeurIPS2024 workshops (LanGame Sat 10:20, OWA Sun 4:50). Please come visit us😃
We would also love to chat about open-endedness, LLM agents, etc. Come by if you want to meet!
Interested in robust model-based offline RL algorithms? Come check out Anya Sims presenting our new paper investigating the edge of reach problem in offline MBRL!
📍East Exhibit Hall A-C #4603
#NeurIPS2024
The RL (and some non-RL folks) starter pack is almost full. Pretty clear that the academic move here has succeeded
go.bsky.app/3WPHcHg
Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.
go.bsky.app/MdVxrtD