Work w/ Chengtian Ma, Abigail Xu, Maya Shaked, Pattie Maes, and @nikhilsinghmus.bsky.social
🧵9/9
Posts by Manuel Cherep
ABxLAB offers:
✅ An open-source man-in-the-middle testbed for real web environments
✅ A scalable consumer choice benchmark for agentic decision-making
✅ A dataset of causal effects of ratings, prices, and nudges across 17 LLMs
📦 Code: github.com/PapayaResearch/abxlab
🧵8/9
This changes the analysis for LLM agents: not “Did it complete the task?” but:
“What governs its decisions when multiple valid options exist?”
A question behavioral scientists have been asking about humans for decades. ABxLAB is a step toward that science for agents.
🧵7/9
We tested user profiles, e.g. “The user is on a tight budget.”
These act like switches: once a preference is declared, it dominates all other attributes.
The takeaway isn’t that agents are biased shoppers, but that this offers a diagnostic window into agent behavior.
🧵6/9
Even without human cognitive limits, agents:
- Heavily over-weight ratings
- Over-weight cheaper items when ratings are matched
- Are swayed by trivial order effects
- Fall for simple nudges (e.g. “Best seller”)
These are systematic, often large effects.
🧵5/9
The main finding: LLM agents are not the rational, utility-maximizing actors we might hope for.
Rather, they are strongly biased by these cues. We found agents are often 3-10x+ more susceptible to nudges and superficial attribute differences than our human baseline.
🧵4/9
We applied ABxLAB to a realistic shopping task, running 80,000+ experiments on 17 SOTA models (GPT-5, Claude 4, Gemini 2.5, Llama 4, etc.).
We systematically manipulated:
💰Prices
⭐️Ratings
🔀Presentation order
👉Classic psychological nudges (authority, social proof, etc)
🧵3/9
How does it work? ABxLAB is a "man-in-the-middle" framework.
It intercepts web content in real-time to run controlled experiments on agents by modifying the choice architecture.
Think of it as a behavioral science lab for LLMs.
Paper: arxiv.org/abs/2509.25609
🧵2/9
🚨New Preprint 🚨
Current agent evals mostly measure competence, but miss behavior e.g. are their decisions stable, rational, manipulable, human-like?
We introduce ABxLAB, a framework for studying agent behavior. Using it we create an agentic consumer behavior benchmark.
🧵1/9
3. 👤 User preferences act almost like hard rules, where LLMs might incur significant trade-offs to comply with them
4. 🧑 Humans, in contrast, are far less sensitive to such signals
In a shopping case study across 17 SOTA LLMs, we find:
1. 🛒 Choices are highly determined by rating, price, incentives, and nudges
2. 🔀 Models follow a lexicographic-like decision rule, hierarchically valuing different attributes
The code for Audio Doppelgängers is also open-source. We hope you find it useful for further exploring how and why we can learn from synthetic data.
💻 github.com/PapayaResear...
🧵3/3
In CTAG (ICML24), we show how a simple synth (from SynthAX ⚡️) can recover properties of real-world sounds. Audio Doppelgängers use the same power to learn to listen from what can be perceived as just noise.
CTAG: ctag.media.mit.edu
SynthAX: github.com/PapayaResear...
🧵2/3
✨Contrastive Learning from Synthetic Audio Doppelgängers #ICLR2025✨ w/
@nikhilsinghmus.bsky.social
Our method learns useful audio representations with randomly synthesized sounds (often better than real data!)
🌐Project: doppelgangers.media.mit.edu
📄Paper: arxiv.org/abs/2406.05923
🧵1/3
If you're at NeurIPS, and interested in this topic, come chat! We're working to extend this line of work and value feedback from the community
🧵 3/3
In a complex decision-making task, we show how LM-based agents' choices superficially resembled humans', but exhibit suboptimal information acquisition strategies and extreme susceptibility to a simple nudge.
🧵 2/3
Paper title: Superficial Alignment, Subtle Divergence, and Nudge Sensitivity in LLM Decision-Making; Authors: Manuel Cherep*, Nikhil Singh*, and Pattie Maes
Excited to present our new paper on nudging LLMs (👉🤖) as a spotlight talk at the NeurIPS Behavioral ML Workshop! @neuripsconf.bsky.social
w/ Nikhil Singh* (@nikhilsinghmus.bsky.social) and Pattie Maes
🔗 openreview.net/forum?id=chb...
🧵 1/3