Children exhibit visual understanding from limited experience, orders of magnitude less than our best models.
We introduce the Zero-shot World Model (ZWM). Trained on a single child's visual experience, BabyZWM rapidly generates competence across diverse benchmarks with no task-specific training. 🧵
Posts by Yash Shah
A retinotopic wiring principle of the human brain www.biorxiv.org/content/10.64898/2026.04...
Self-supervised learning yields representational signatures of category-selective cortex www.biorxiv.org/content/10.64898/2026.02...
Geometric constraints in the development of primate extrastriate visual cortex www.biorxiv.org/content/10.64898/2026.02...
Efficient task generalization and humanlike face perception in models that learn to discriminate face geometry www.biorxiv.org/content/10.64898/2026.01...
Early face deprivation leads to long-lasting deficits in cortical face processing www.biorxiv.org/content/10.64898/2026.02...
Retinal waves shape starburst amacrine cell dendrite development through a direction-selective dendritic computation www.biorxiv.org/content/10.64898/2026.02...
Human cortical networks trade communication efficiency for computational reliability. www.biorxiv.org/content/10.64898/2025.12...
Functional architecture for speed tuning in primary visual cortex of carnivores www.biorxiv.org/content/10.1101/2025.11....
A semantotopic map in human hippocampus www.biorxiv.org/content/10.1101/2025.10....
Re-emergence of orientation coding in primate IT cortex and deep networks reveals functional hubs for visual processing www.biorxiv.org/content/10.1101/2025.10....
Representations in the hippocampal-entorhinal system emerge from learning sensory predictions www.biorxiv.org/content/10.1101/2025.10....
Very exciting preprint from Dan Yamins' NeuroAI lab, proposing Probabilistic Structure Integration (PSI), a way to bootstrap from pixels to higher-level visual abstractions through a kind of visual prompting. One of the deepest and most original ideas I've read in a while.
arxiv.org/abs/2509.09737
I've been arguing that #NeuroAI should model the brain in health *and* in disease -- very excited to share a first step from Melika Honarmand: inducing dyslexia in vision-language-models via targeted perturbations of visual-word-form units (analogous to human VWFA) 🧠🤖🧪 arxiv.org/abs/2509.24597
Haider Al-Tahan, Mayukh Deb, Jenelle Feather, N. Apurva Ratan Murty: End-to-end Topographic Auditory Models Replicate Signatures of Human Auditory Cortex https://arxiv.org/abs/2509.24039 https://arxiv.org/pdf/2509.24039 https://arxiv.org/html/2509.24039
Metabolic organization of macaque visual cortex reflects retinotopic eccentricity and category selectivity www.biorxiv.org/content/10.1101/2025.09....
Functional organization of the human visual system at birth and across late gestation www.biorxiv.org/content/10.1101/2025.09....
Unfolding spatiotemporal representations of 3D visual perception in the human brain www.biorxiv.org/content/10.1101/2025.08....
Visual Word Form Area demonstrates individual and task-agnostic consistency but inter-individual variability www.biorxiv.org/content/10.1101/2025.07....
Many-Two-One: Diverse Representations Across Visual Pathways Emerge from A Single Objective www.biorxiv.org/content/10.1101/2025.07....
And of course because this is my first ever post I forgot to include hashtags! #ICML2025
Check out the paper if interested and come talk to me during the poster session (July 17, Thursday at 4:30pm) if in Vancouver! icml.cc/virtual/2025.... [11/n]
Finally, R-MDN, because it operates on the level of individual examples, can be integrated in both convolutional neural networks and vision transformers—which was one of the significant limitations of the MDN algorithm. [10/n]
And R-MDN makes equitable predictions across population groups, such as across both boys and girls when performing sex classification on the ABCD (Casey et al., 2008) dataset in the presence of pubertal development scores as the confounder. [9/n]
R-MDN can also remove the influence from multiple confounding variables, as seen when testing on the ADNI (Mueller et al., 2005) dataset. [8/n]
Since R-MDN is a normalization layer, it can be tacked on to various already-proposed model architectures. [7/n]
R-MDN effectively removes confounder influence from learned DNN features, as rigorously verified in both synthetically controlled environments and real-world datasets. [6/n]
We propose Recursive Metadata Normalization (R-MDN), a normalization layer that leverages the statistical recursive least squares algorithm to iteratively update its internal parameters based on previously computed values whenever new data are received. [5/n]
However, within continual learning, data becomes available sequentially, often over the span of several years, as in longitudinal studies. [4/n]
Prior work such as BR-Net (Adeli et al., 2020), MDN (Lu et al., 2021), and P-MDN (Vento et al., 2022) proposed to learn confounder-invariant representations in DNNs work within a static learning setting and assume that the algorithm has access to all data at the outset of training. [3/n]