Co-led with @pkargupta.bsky.social ✨ We learned so much and couldn't have done it w/o our amazing collaborators and mentors: Ken Wang, Jinu Lee, @shan23chen.bsky.social, @orevaahia.bsky.social, Dean Light, Tom Griffiths, @maxkw.bsky.social, Jiawei Han, @asli-celikyilmaz.bsky.social, Yulia Tsvetkov🩵
Posts by Stella Li
More fun details, especially extension cognitive science background💫, in our 24-page paper!
📄Paper: arxiv.org/abs/2511.16660
💻Code: github.com/pkargupta/co...
🤗Data: huggingface.co/collections/...
🌐 Blogpost: tinyurl.com/cognitive-fo...
What our Cognitive Foundations framework enables:
🔍 Systematic diagnosis of reasoning failures
🎯 Predicting which training→which capabilities
🧪 Testing cognitive theories at scale
🌉 Shared vocabulary bridging cognition & AI research
More on opportunities & challenges in📄
Test-time reasoning guidance: up to 66.7% improvement 💡
We scaffold cognitive structures from successful traces to guide reasoning.
Major gains on ill-structured problems🌟
Models possess latent capabilities—they just don't deploy them adaptively without explicit guidance.
🧑🏻Humans reason differently‼️ More abstraction (54% vs 36%), self-awareness (49% vs 19%), conceptual processing. Less surface enumeration and rigid sequential chaining.
Even with correct answers—underlying mechanisms diverge fundamentally.
We analyzed 1,598 LLM reasoning papers:
Research concentrates on easily quantifiable behaviors—sequential organization (55%), decomposition (60%)
Neglects meta-cognitive controls (8-16%) and alternative representations (10-27%) that correlate with success⚠️
Structure matters as much as presence📐
We introduce method to extract reasoning structure from traces
Successful: selective attention → knowledge alignment → forward chaining
Common: skip to forward chaining
LLMs prematurely seek solution before understanding constraints‼️
Model-specific patterns reveal training impact:
Olmo3 exhibits more diverse cognitive elements (49%)—they explicitly included meta-reasoning data during midtraining.
DeepHermes-3: only 12% avg presence.
Training methodology shapes cognitive profiles dramatically.
Meta-cognitive deficit is severe:
🤔Self-awareness: 16% in research design, 19% in LLM traces vs 49% in humans
🧐Self-evaluation on non-verifiable problems collapses (53.5% presence, 0.031 correlation)
Models can't self-assess without ground truth.
The presence-effectiveness paradox:
Logical coherence: 91% of traces, 0.091 corr. w/ success Knowledge alignment: 20% of traces, 0.234 correlation (high)
Models frequently attempt core elements but fail to execute. Having the capability ≠ deploying it successfully😬
Models deploy strategies inversely to what works 🚨
As problems become ill-structured, models narrow their repertoire—but successful traces show need for greater diversity (successful = high ppmi in fig).
Sequential organization dominates. Meta-cognition disappears in LLMs.
We analyze 192K LLM reasoning traces from 18 models (text,image,video)+LLM 54 humans think-aloud traces
We introduce a framework for fine-grained span-level cognitive evaluation: WHICH elements appear, WHERE, and HOW they're sequenced.
First analysis of its kind at this scale📊
Our taxonomy bridges cognitive science → LLM eval:
28 elements across 4 dimensions—reasoning invariants (compositionality, logical coherence), meta-cognitive controls (self-awareness), representations (hierarchical, causal), and operations (backtracking, verification)
LLMs solve hard problems but fail on easy variants, exhibit patterns different from humans.
The issue: reasoning evaluations is by outcomes w/o understanding the cognitive processes that produce them. We can't diagnose failures or predict how training produces capabilities🚨
🤔💭What even is reasoning? It's time to answer the hard questions!
We built the first unified taxonomy of 28 cognitive elements underlying reasoning
Spoiler—LLMs commonly employ sequential reasoning, rarely self-awareness, and often fail to use correct reasoning structures🧠
Because Olmo 3 is fully open, we decontaminate our evals from our pretraining and midtraining data. @stellali.bsky.social proves this with spurious rewards: RL trained on a random reward signal can't improve on the evals, unlike some previous setups
Day 1 (Tue Oct 7) 4:30-6:30pm, Poster Session 2
Poster #77: ALFA: Aligning LLMs to Ask Good Questions: A Case Study in Clinical Reasoning; led by
@stellali.bsky.social & @jiminmun.bsky.social
This project was done as part of the Meta FAIR AIM mentorship program. Special thanks to my amazing collaborators and awesome mentors @melaniesclar.bsky.social @jcqln_h @hunterjlang @AnsongNi @andrew_e_cohen @jacoby_xu @chan_young_park @tsvetshop.bsky.social @asli-celikyilmaz.bsky.social 🫶🏻💙
✨PrefPalette🎨 bridges cognitive science, social psychology, and AI for explainable preference modeling✨
📖Paper: arxiv.org/abs/2507.13541
💻Code: github.com/stellalisy/P...
Join us in shaping interpretable AI that you can trust and control🚀Feedback welcome!
#AI #Transparency
🌍Bonus: PrefPalette🎨 is a computational social science goldmine!
📊 Quantify community values at scale
📈 Track how norms evolve over time
🔍 Understand group psychology
📋 Move beyond surveys to revealed preferences
💡Potential real-world applications:
🛡️Smart content moderation—explains why content is flagged/decisions are made
🎯Interpretable LM alignment—revealing prominent attributes
⚙️Controllable personalization—giving user agency to personalize select attributes
🔍More importantly‼️we can see WHY preferences differ:
r/AskHistorians:📚values verbosity
r/RoastMe:💥values directness
r/confession:❤️values empathy
We visualize each group’s unique preference decisions—no more one-size-fits-all. Understand your audience at a glance🏷️
🏆Results across 45 Reddit communities:
📈Performance boost: +46.6% vs GPT-4o
💪Outperforms other training-based baselines w/ statistical significance
🕰️Robust to temporal shifts—trained pref models can be used out-of-the box!
⚙️How it works (pt.2)
1: 🎛️Train compact, efficient detectors for every attribute
2: 🎯Learn community-specific attribute weights during preference training
3: 🔧Add attribute embeddings to preference model for accurate & explainable predictions
⚙️How it works (prep stage)
📜Define 19 sociolinguistics & cultural attributes from literature
🏭Novel preference data generation pipeline to isolate attributes
Our data gen pipeline generates pairwise data on *any* decomposed dimension, w/ applications beyond preference modeling
Meet PrefPalette🎨! Our approach:
🔍⚖️models preferences w/ 19 attribute detectors and dynamic, context-aware weights
🕶️👍uses unobtrusive signals from Reddit to avoid response bias
🧠mirrors attribute-mediated human judgment—so you know not just what it predicts, but *why*🧐
🔬Cognitive science reveals how humans break choices into attributes, e.g.:
😂 Humor
❤️ Empathy
💬 Conformity
...then weight them based on context (e.g. comedy vs counseling).
These traits shape every decision, from product picks to conversation tone. Your mind is a colorful palette🎨
🚨Current preference models only output a reward/score:
❌No transparency in decision-making
❌Personalization breaks easily, one-size-fits-all scores
❌Use explicit annotations (response bias)
They can’t adapt to individual tastes, can’t debug errors, and fail to build trust🙅
WHY do you prefer something over another?
Reward models treat preference as a black-box😶🌫️but human brains🧠decompose decisions into hidden attributes
We built the first system to mirror how people really make decisions in our recent COLM paper🎨PrefPalette✨
Why it matters👉🏻🧵
Want to quickly sample high-quality images from diffusion models, but can’t afford the time or compute to distill them? Introducing S4S, or Solving for the Solver, which learns the coefficients and discretization steps for a DM solver to improve few-NFE generation.
Thread 👇 1/