The study's message extends beyond the lab. As we increasingly face externally provided, prefitted models—folk theories, political narratives, or purported expert wisdom—in modern life, a tendency to overprefer complex predictive models may profoundly influence our individual and social well-being!
Posts by Liu, Shuze
We found that humans are undersensitive to overfitting: they overrely on training loss as a proxy for test loss, and penalize other Efron terms suboptimally. Typical decision noise worsens the picture: people should have chosen the simple predictive model most of the time in our task, yet few do so.
So, do people intuitively understand overfitting? We generated noisy scatterplot data and asked humans to choose between predictive models fitted to the data. We compared humans to Efron's decomp, which predicts test loss from training loss, model complexity, training sample size, and data noise.
There is abundant literature from statistics and ML on predictive model overfitting. When data is scarce and noisy, an underparameterized model may be preferred over flexible models that reflect the ground truth, as their fitted params would likely capture training data noise and generalize poorly.
Human model selection studies often focus on generative model families, exemplified by the Bayesian Occam's razor that integrates over parameter values. Yet outside the lab, we often rely on prefitted models from others, and success is judged not by generative truth but by predictive accuracy.
Thanks Sam! Main takeaways:
1) Ground-truth vs. predictive model selection differ under noisy and scarce data—for prediction, oversimplified models may work better in avoiding overfitting.
2) When humans decide between externally provided, prefitted predictive models, they're undersensitive to 1).
Overall, by jointly studying constraints on action sets and policy complexity, we provide a general picture of how humans adjust the two together. The results reveal lossy problem simplification beyond typical modeling assumptions, which may be an integral part of naturalistic human decision-making!
In a large-action-space contextual bandit experiment, we found that humans exploit the above interplay, enlarging action sets alongside policy complexity to mitigate suboptimality. Under time limits, they remain near-optimal for their chosen action set, indicating spontaneous problem simplification.
Beyond explaining past data, our framework prescribes a complex interaction. Enlarging the action set size uncaps policy complexity and enables greater reward. It also mitigates the increase in suboptimality following policy complexity increments, boosting the reward-efficiency of cognitive effort!
Using rate-distortion theory, we assess suboptimalities incurred by smaller action consideration set sizes at various levels of policy state-dependence. We rationalize empirical signatures of human option generation as adaptations to joint limitations on action set size and policy complexity.
In real-life decisions, vast action spaces often preclude our exhaustive consideration. Furthermore, cognitive constraints limit the state-dependence of policies that map world states to the actions considered. We build a resource-rational framework unifying both ecologically relevant constraints!
@gershbrain.bsky.social and I have a new paper in PLOS Comp Bio!
We study how two cognitive constraints—action consideration set size & policy complexity—interact in context-dependent decision making, and how humans exploit their synergy to reduce behavioral suboptimality.
osf.io/preprints/ps...
Finally, a huge thanks to my mentors @gershbrain.bsky.social and Bilal Bari for their support, insight, and encouragement!
For those interested, we also have an earlier JEP:G paper, which inspired the mental cost modeling in our CogSci submission:
gershmanlab.com/pubs/Liu25.pdf
We found that humans meta-reason their policy complexity according to both time and mental costs, exhibiting consistently supralinear mental cost functions across tasks. This overturns common assumptions, and supports the construct validity of info-theoretic measures as a domain-general mental cost!
To address this literature gap, we designed a series of contextual bandit experiments---addressing speed-accuracy tradeoffs, working memory set size manipulations, and reward magnitudes---to stress-test the mutual information formulation of mental costs and look for consistent relationships.
While rational inattention & policy compression have formulated mental costs via mutual information, there remains disagreement on whether it induces a capacity limit or a linear cost. Time costs further confound the picture, as they also incentivize low policy complexity to reduce decision time.
It is well known that context sensitivity incurs mental costs. However, it is unclear which domain-general cognitive resources underlie mental costs, and whether existing cost formulations have construct validity--which likely requires them to scale robustly with the assumed resource substrate.
Looking forward to sharing our work at #cogsci2025! We aim at getting one step closer to a domain-general formulation of mental costs via policy compression.
Come see my presentation at "Talks 35: Reasoning". It is scheduled at 16:44 PST, August 1 at Nob Hill C!
gershmanlab.com/pubs/LiuGers...
This is an unusual one: the project started in 2009, with Trevor Holland collecting data in an auditory-visual integration paradigm with many conditions within subjects. But modeling it remained challenging over the years until Luigi Acerbi and Shuze Liu took it on. osf.io/preprints/ps...
Overall, we stress-test the Bayesian account of multisensory perception by systematically traversing its full modeling space. Human behavior remains well-explained, but only under specific, often overlooked assumptions. A richer picture emerges when we let data guide our modeling assumptions!
Beyond core inference, other complex perceptual factors play a role too:
1) Sensory noise increases in multisensory trials, likely due to divided attention;
2) Auditory observations are stretched according to the visual range, suggesting spontaneous cross-modal recalibration in humans.
We found that key model choices drastically affect fits and better explain the human central tendency:
1) Human priors are non-Gaussian;
2) Sensory noise is heteroskedastic, dipping centrally and plateauing peripherally;
3) Both model averaging (optimal) and probability matching fit behavior well.
We explore modeling choices in Bayesian cue integration—priors, sensory noise functions, and causal inference strategies—using a data-driven, semiparametric approach. With promising candidates identified, we enumerate them in a combinatorial model space and test them via model comparison.
@weijima.bsky.social, @lacerbi.bsky.social, Trevor, and I have a new preprint on Bayesian models of multisensory perception!
We systematically inspect the underexplored degrees of freedom in Bayesian models, squeezing out their best capability in capturing human behavior.
osf.io/preprints/ps...
Overall, our study:
1) Connects seemingly disparate cognitive measures: state-dependence v. RT, goal-directed v. habitual behavior;
2) Prescribes task-general insight via normative principles;
3) Highlights the utility of incorporating multiple resource formulations in resource-rational studies!
Across three experiments, humans adaptively adjusted policy complexity in the predicted directions (though with a leftward bias, which we model via memory costs in our CogSci paper). LBA modeling revealed that policy-compression-style perseveration had manifested strongly in participant behavior.
Given policy complexity-RT relations, we can derive policy complexity levels that maximize reward over time. This generates predictions across various task manipulations, including ITIs, reward regularities, and set sizes (reward magnitudes forthcoming in CogSci 2025), which we test in this paper.
Policy compression applies rate-distortion theory to action selection, specifying the attainable reward at every policy state-dependence/complexity level. It prescribes a linear relationship b/w policy complexity and RTs, and rationalizes action perserveration as optimal usage of limited resources.
Tailoring actions to states taxes cognitive resources. Two prominent resource formulations are time and memory, studied in speed-accuracy tradeoffs and set-size effects. We unify them under policy compression, prescribing how humans should adaptively adjust the state-dependence of their policies.
@lucylai.bsky.social, @gershbrain.bsky.social, Bilal Bari, and I have a new paper out in JEP:G!
We study the time and memory costs of policy compression—a resource-rational framework for decision making, focusing on how state-dependent our policies ought to be.
psycnet.apa.org/record/2026-...