Advertisement · 728 × 90

Posts by Khai Loong Aw

Using AI to improve (not automate away) academic research Blog about fatherhood, langauge, developmental psychology, and cognitive science.

Just wrote a new blogpost trying to summarize my thoughts on the question of how and whether to use AI for research in psychology and cognitive science: babieslearninglanguage.blogspot.com/2026/04/usin...

1 day ago 51 24 4 3

There's a lot to like here!
- Very smart way to use a masked autoencoder (unsupervised technique!) to build a world model from visual data. This makes other visually based world models I've seen seem clumsy in comparison.
1/4

6 days ago 1 1 1 0

Yea many animals are even more data-efficient than humans (in some ways). That work on honey bees sounds super interesting

6 days ago 1 0 0 0

Beautiful use of the BabyView dataset to train a visual learning model!

1 week ago 4 1 0 0

I think we finally made really significant progress on the biggest unsolved "developmental AI" problem: learning from human-scale data. Key idea: zero-shot world models that support concept extraction via approximate causal inference. amazing collab w/ @mcxfrank.bsky.social @khaiaw.bsky.social

1 week ago 41 11 1 1

So excited about this work using our data of children’s first-person experiences to train efficient, flexible visual learning models!

1 week ago 7 1 0 0
Post image

Children exhibit visual understanding from limited experience, orders of magnitude less than our best models.

We introduce the Zero-shot World Model (ZWM). Trained on a single child's visual experience, BabyZWM rapidly generates competence across diverse benchmarks with no task-specific training. 🧵

1 week ago 55 24 1 4

Huge thanks to coauthors Klemen Kotar, Wanhee Lee, Seungwoo Kim, Khaled Jedoui, Rahul Venkatesh, Lilian Naing Chen, @mcxfrank.bsky.social, @dyamins.bsky.social

Grateful to Cameron Ellis, Hyowon Gweon, James (Jay) McClelland, Cliona O'Doherty, and Alison Gopnik for helpful feedback on our manuscript

1 week ago 5 0 0 0
Preview
Zero-shot World Models Are Developmentally Efficient Learners Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other aspects of physical scene understanding. Childre...

Paper: arxiv.org/abs/2604.10333
Models: huggingface.co/awwkl/models
Code: github.com/awwkl/ZWM (soon)

1 week ago 5 0 1 0
Advertisement

A broader implication is that with the right learning machinery, “developmental efficiency” and rich capabilities emerge from ecological experience. In sum, BabyZWM provides concrete computational principles for how general-purpose visual cognition can emerge from limited natural experience.

1 week ago 3 1 2 0

BabyZWM has implications for long-standing debates over innateness. It points to a hybrid hypothesis where humans likely have a small set of innate machinery – architecture, learning algorithm, task-specific readout programs – while the representational content and network parameters are learned.

1 week ago 3 0 1 1

ZWM represents a shift from the dominant paradigm of representation learning + task-specific readouts to unified world models (i.e., predicting consequences of actions).

This mirrors the shift when LLMs replaced task-specific fine-tuned models, except in vision with orders of magnitude less data.

1 week ago 3 1 1 0

The generality and simplicity of ZWM's approach can be applied to other domains: predictive modeling + approx. causal inference + chaining extractors toward increasing abstraction. The ecological data gap shows up wherever web-scale data is not available: robotics, medical imaging, and the sciences

1 week ago 3 0 1 0
Post image

BabyZWM develops brain-like internal representations that align with neural activity from human fMRI and macaque electrophysiology. Neural predictivity yields an “early-first” developmental pattern: primary visual regions (V1) reach noise ceiling early, higher-order visual areas improve gradually.

1 week ago 2 0 1 0
Post image

BabyZWM’s developmental trajectories broadly recapitulate behavioral signatures of children’s learning. However, interpret these cautiously; they partly reflect benchmark-specific design choices. We need more systematic, like-for-like benchmarking of early visual abilities in humans and machines.

1 week ago 2 0 1 0
Post image

Strikingly, BabyZWM matches SOTA supervised models, despite no task-specific examples or labels. It outperforms representation-based systems (DINOv3, V-JEPA2, and BabyView-trained versions) on several tasks, zero-shot. All models receive the same inputs for fair comparison. (Image: optical flow)

1 week ago 2 0 1 0
Post image

To test ZWM as a developmental hypothesis, we train it on BabyView (Long et al., 2025), a recent dataset of children's first-person experience. We also train Single-Child BabyZWM on one child’s video clips, ordered by their age -- important tests of “developmental efficiency” and continual learning

1 week ago 3 0 1 0

Principle 3: Compositions of prompts allow the extraction of interpretable and increasingly abstract visual structures, building a computational graph of visual abstractions.

1 week ago 2 0 1 0

Principle 2: ZWM's core idea is *zero-shot prompting* via “approximate causal inference”: perturb input, make prediction, compare against unperturbed input. E.g, to segment an object, predict what happens if we move one patch, then compare against unperturbed input to see which pixels move together

1 week ago 2 0 1 0
Advertisement
Post image

A unified model, with no labels, no finetuning, and no task-specific readouts. How does it work?

3 key principles.

Principle 1: ZWM trains a masked autoencoder with sparse “temporally-factored” masking, creating a predictor that flexibly separates visual appearance from underlying dynamics.

1 week ago 4 0 1 0
Post image

Children exhibit visual understanding from limited experience, orders of magnitude less than our best models.

We introduce the Zero-shot World Model (ZWM). Trained on a single child's visual experience, BabyZWM rapidly generates competence across diverse benchmarks with no task-specific training. 🧵

1 week ago 55 24 1 4
Post image

LLMs can retrieve knowledge — but can they connect it in *creative* ways to solve problems?

Introducing CresOWLve 🦉, a new benchmark that evaluates creative problem-solving over real-world knowledge, using puzzles that require multiple creative thinking strategies.👇

1 week ago 3 2 1 0

Here is our best thinking about how to make world models. I would apologize for it being a massive 40-page behemoth, but it's worth reading. arxiv.org/pdf/2509.09737

7 months ago 71 18 2 2
Post image

Happy to share that our BBS target article has been accepted: “Core Perception”: Re-imagining Precocious Reasoning as Sophisticated Perceiving
With Alon Hafri, @veroniqueizard.bsky.social, @chazfirestone.bsky.social & Brent Strickland
Read it here: doi.org/10.1017/S014...
A short thread [1/5]👇

6 months ago 98 39 7 3