Figure 4 Overview of prompt engineering techniques at each step: ND, ZS, and CoT for Step 1, and CoT, SR, PD, and MoRE for Steps 2-4.
"Step 4 presents the greatest challenge. The model is tasked with inferring the underlying reasons behind each perception, based on indirect cues embedded in the conversation. Across all prompting strategies, performance remains low (Positive-F1 = 0.37–0.40), indicating that the model infrequently selects the same reasoning factors as humans. These low scores suggest that extracting latent intent from unstructured group dialogue may go beyond what the current LLM is able to interpret." "These results suggest that the gap between LLM and human annotation widens progressively as the task shifts from surface-level extraction to the interpretation of expressed content, and finally to inference of unstated intent."
Can #AI reasoning models infer people's underlying reasons in unstructured chat data from group decisions?
Across multiple prompting steps, #GTP5 usually did NOT select the same underlying reason as a human rater: doi.org/10.48550/arX...
#AI #cogSci #textAnalysis #psychometrics