ReFACT Benchmark Shows AI Models Struggle with Scientific Confabulation
The ReFACT benchmark provides 1,001 scientific Q&A pairs with annotated confabulations; current LLMs, including GPT‑4o, achieve about 50% accuracy in detecting false answers. Read more: getnews.me/refact-benchmark-shows-a... #refact #ai
0
0
0
0