A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark
The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. #OpenAI models do poorly.
Blog post: adam.holter.com/bullshitbenc...
Results: petergpt.github.io/bullshit-ben...
#LLM
0
0
0
0