With some trepidation, I'm putting this out into the world:
gershmanlab.com/textbook.html
It's a textbook called Computational Foundations of Cognitive Neuroscience, which I wrote for my class.
My hope is that this will be a living document, continuously improved as I get feedback.
Posts by Valerio Pepe
It has been so so fun to think with some of my favorite scientists about what it means to understand!
Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost.
Paper, code & demos: gabegrand.github.io/battleship
Here's what we learned about building rational information-seeking agents... 🧵🔽
I'm really excited about this work (two years in the making!).
We look at how LLMs seek out and integrate information and find that even GPT-5-tier models are bad at this, meaning we can use Bayesian inference to uplift weak LMs and beat them... at 1% of the cost 👀
Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world. Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.
"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷
We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.
New paper w/ Daniel, Will, @jessyjli.bsky.social
It’s week 4 and probably time to start doing machine learning in machine learning class. We begin with the only nice thing we have: the perceptron.
Out of curiosity (and my own ignorance), how are teachers aware of students' socioeconomic backgrounds when the students are this young?
I can think of clothing as an immediate signal, and, over time, getting to know parents (and thus their occupations). Are these the main ways this is inferred?
Can LLMs learn social skills by playing games?
A blogpost on human-model interaction, games, training and testing LLMs
research.ibm.com/blog/LLM-soc...
🤖📈🧠
> looking for a coffee
> have to judge if their coffee is burnt or flavorful
> "we have a Cimbali coffee machine"
> buy coffee
> it's burnt
We take this as evidence that while misalignment directions may exist, the narrative is probably quite nuanced, and EM is not governed by a single vector, as some hypothesized in the aftermath of the original paper.
See it for yourself at:
www.lesswrong.com/posts/qHudHZ...
However, the steered models often are more incoherent than the finetuned ones, suggesting that emergent misalignment is not entirely guided by a steering vector. The vectors themselves are also not very interpretable, so it is unclear what exactly they are capturing.
The answer is: yes (sort of).
Though the finetune itself seems to be learning more than a single steering vector, extracting steering vectors and applying them (with sufficient scaling) to the same layer in an un-finetuned version of the model *does* elicit misaligned behavior.
We finetuned a single layer, and show that on certain layers, this process renders the model nearly as misaligned as a full-layer finetune. This allows us to ask: can we capture this misalignment in a single steering vector taken from the layer?
An interpretation of the original paper was that EM is mediated by a “misalignment direction” within the model, which the finetuning process changes, rendering the model much more toxic/misaligned.
New blog post! www.lesswrong.com/posts/qHudHZ...
Following Emergent Misalignment, we show that finetuning even a single layer via LoRA on insecure code can induce toxic outputs in Qwen2.5-Coder-32B-Instruct, and that you can extract steering vectors to make the base model similarly misaligned 🧵
Sam is 100% correct on this. Indeed, human babies have essential cognitive priors such as permanence, continuity, and boundary of objects, 3D Euclidean understanding of space, etc.
We spent 2 years to systematically to examine and show the lack of such in MLLMs: arxiv.org/abs/2410.10855
I think the BabyLM Challenge is really interesting, but also feel that there is something fundamentally ill-posed about how it maps onto the challenge facing human children. It's true that babies only get a relatively limited amount of linguistic experience, but...
Word learning is usually about what a word does refer to. But can toddlers learn from what it doesn’t?
Our new Cognition paper shows 20-month-olds use negative evidence to infer novel word meanings, reshaping theories of language development.
www.sciencedirect.com/science/arti...
ai is truly revolutionary -- scientists hadn't previously considered what would happen if sally had simply eaten the marble instead, to know its location at all times
Let's go Lio!!!
fair, italy has some incredibly creative offensive slang -- fwiw my favorite usable roman insult ("porco dio" is too offensive for casual use) is "sei 'na pentola de facioli", "you are a pot of beans", i.e. you never stop muttering and talking
as someone from Rome I'm currently sitting at my laptop like the mentats from Dune trying to figure out what words this could be referring to
we also take pride in preparing gnocchi incorrectly because the rest of italy can't make a decent carbonara to save their lives (no cream and no parmesan!)
congratulations!!
the icml keynote will be jensen huang speaking to an empty room
Photoshop for text. In our #CHI2025 paper “Textoshop”, we explore how interactions inspired by drawing software can help edit text. We consider words as pixels, sentences as regions, and tones as colours. #HCI #NLProc #LLMs #AI Thread 🧵
(1/10)
(also holy loud sound effects batman)
saw a new pika model was out on twitter & robot-gasoline-bench does not disappoint
All the ACL chapters are here now: @aaclmeeting.bsky.social @emnlpmeeting.bsky.social @eaclmeeting.bsky.social @naaclmeeting.bsky.social #NLProc
(1/5) Very excited to announce the publication of Bayesian Models of Cognition: Reverse Engineering the Mind. More than a decade in the making, it's a big (600+ pages) beautiful book covering both the basics and recent work: mitpress.mit.edu/978026204941...
Hello bsky! I'm Valerio, an undergrad at Harvard studying computer science and cognitive science.
I'm interested in the inductive biases that make language learning and reasoning so easy for us humans, and what their analogues are in machines.
If you're around Boston, I would love to grab coffee!