Thanks a lot for the generous and lovely review @izaskoczen.bsky.social !!
Posts by Guilherme Almeida
Generous PhD funding at Surrey (SCLP applicants eligible) - fees and stipend (£20,780 per annum). Do share widely! Deadline is April 4, start date Oct 2026.
www.surrey.ac.uk/fees-and-fun...
Almeida on Dual Character Concepts, buff.ly/QjZqcuZ - Guilherme Almeida has posted A Defense of Dual Character Concepts in Legal Philosophy and Beyond on SSRN.
I just posted a new pre-print where I argue that dual character concepts are something new and interesting in legal philosophy and beyond and that they can't be reduced to ambiguity, prototypes, or metalinguistic negotiations. Comments are very welcome! papers.ssrn.com/sol3/papers....
Evidence of Public Acceptance of AI Law Clerks: We investigated how Kenyans evaluate the legitimacy of court decisions when judges rely on AI-generated legal research—versus that of human law clerks. www.tandfonline.com/doi/full/10.... 1/
This is a job in political theory/philosophy that is specifically for someone interested in empirical work. Looks perfect for folks doing experimental philosophy
oh wow wow wow. Wow.
It's officially out, after many years:
"Loopholes: A window into value alignment and the communication of meaning"
authors.elsevier.com/c/1k~vV_Ebvv...
Read on for a brief summary, but I encourage you to read the thing itself.
That’s a great point! The study also included legal rules, like one prohibiting shooting at deers. The design doesn’t allow us to look at the two sets of rules separately, but that would be a good direction for future research.
Now out @ Journal of Research in Personality (free until June): authors.elsevier.com/a/1k%7EqWL4L...
We found evidence that trait empathy correlates with purposivism in rule violation judgments. Also: most people share a single concept that seems to have a dual character structure.
The long awaited results of the Brazilian Reproducibility Network are in! Their final sample consists of 97 replications of 47 studies. The only coherent measure of replication, p<.05, shows a replication rate of 19%. www.biorxiv.org/content/10.1...
But the usual caveats should still be in place. For instance, even with temperature calibration, models still showed diminished diversity of thought when compared to humans.
Comments are more than welcome!
Overall, this suggests that the models are doing something more than mere memorization and that we could potentially learn about the likely reactions humans would have to novel stimuli by looking at LLM responses. 13/14
Interestingly, LLMs diverge here. GPT-4o and Llama 3.2 90b were not affected by the time pressure manipulation, but Claude 3 and Gemini Pro were. Moreover, the latter were similar to humans in that they relied more on text under forced delay than under time pressure. 12/14
The cool thing, of course, is that you can't really put LLMs under time pressure or forced delay (at least not with the public APIs). Thus, we're just either telling the model that it should respond within 4 seconds or that it must wait at least 15 seconds. 11/14
That depends at least in part on whether you think that the time pressure manipulation is inducing a bias or not. But you could argue either that competent concept application requires sensitivity to time constraints, or that time constraints elicit bias by restricting processing. 10/14
For Study 2, we decided to try something different. Among humans, we know that time pressure leads to more purposivism and a forced delay leads to more textualism. This could be read as either a context-sensitive feature of the concept of rule or as a bias. What would competent LLMs do? 9/14
Even more surprisingly, the same thing was true for all the models we tested: all of them were less textualist on the new stimuli when compared to the old stimuli. We interpret this to be evidence of conceptual mastery. Even subtle differences between stimuli are tracked by current LLMs. 8/14
The human data was surprising in that it revealed a significant difference between old and new vignettes. We didn't expect there to be any difference, but participants relied on text to a lesser extent on new vignettes when compared to old vignettes 7/14
To deal with (2), we first collected new data from humans. We then computed the standard deviation in each cell of our 2 (text) x 2 (purpose) x 4 (scenario) x 2 (new vs. old) design (total: 32 cells) and selected the temperature for each model that minimized the mean squared error between SDs. 6/14
To address issue (1), we created new vignettes that were supposed to match up perfectly with those in an earlier paper (doi.org/10.1037/lhb0...), changing just the exact words used. If models are just memorizing, they wouldn't be able to generalize to the new stimuli (although that's debatable) 5/14
Temperature is a parameter controlling the extent to which models will prioritize their best answer. Previous research sometimes set temperature to 0, driving models to nearly-deterministic results, while others vary it in somewhat arbitrary ways. We think there is a better way to do this! 4/14
2) Even when the significance patterns are similar, LLMs tend to show diminished diversity of thought, (see arxiv.org/abs/2302.07267) that is, different runs of the same model show much less variance in response to a fixed stimuli than a human sample. But LLM-APIs allow us to adjust that. 3/14
Previous work has shown that LLMs respond to stimuli in roughly the same way as humans. Usually, those papers compare the responses generated by LLMs with previously published human results. The issue is that LLMs could achieve this result through memorization. So, we need new stimuli. 2/14
Short thread on the latest paper led by @joseluiz.bsky.social w/ @lawstuff.bsky.social: arxiv.org/abs/2503.00992
The paper addresses two issues w/ previous machine psychology papers (including our own): 1) are LLMs mastering concepts, or are they memorizing the data? 1/14
📢 New publication by Piotr Bystranowski from INCET in open access: "Self-absorbed, yet interesting? A bibliometric study on general jurisprudence" journals.openedition.org/revus/10886
New paper with @joseluiz.bsky.social and @almeida2808.bsky.social showing that AI possesses the concept of rule. We find that generative AI emulates how humans apply rules to novel situations in which a rule's letter and spirit conflict. papers.ssrn.com/sol3/papers....
Our incredibly short (5 page) paper on intuitions about consent — with Joanna Demaree-Cotton and @rosesomm.bsky.social
We find cases where people agree that both:
(a) There’s a sense in a which a person clearly consented
(b) In deeper sense, she did not consent at all
osf.io/63d8s
New study with Joanna Demaree-Cotton and Josh Knobe!