#SynAnchors hashtag - Bluesky

8 months ago

Figure 1: Illustration of our research on LLM’s anchoring effect from three key aspects: (1) Existence: showing significant biases toward different anchor values for identical questions. (2) Mechanism: using causal tracing and statistics to explore underlying patterns. (3) Mitigation: evaluating across varied mitigation strategies. ‘Q: "...?"’ refers to asking the same question again.

Figure 2: Causal tracing on attention (red) and FFN (green) modules of LLama-3.1-8B-Instruct about semantic anchoring questions. The X-axis represents the layer index of the model (32 layers). The Y-axis is the ROI tokens.

Mitigation strategies. And Figure 3: Percentages of sufficient anchor information mentions in DeepSeek-R1 reasoning contents. Legend: “Anchored” refers to the percentages of questions judged as anchored based on the metrics introduced in Section 4.1; “All” and “Non-anchored” indicate the percentages over all questions and those judged nonanchored, respectively. We employ an LLM-as-a-Judge approach to automatically detect explicit mentions of anchor-influenced features in reasoning contents, guided by detailed criteria defining what extent can be counted as sufficient mention (see more in Appendix C).

Table 2: Evaluation of mitigation strategies on semantic and numerical tasks. Green arrows (↓) indicate the degree of mitigation, with a deeper color representing better mitigation. ‘∗’ denotes cases with ≤ 10% invalid results (if exist). ‘⋄’ indicates results are derived on test splits, which exclude train splits of LoRA.

Are #languageModels vulnerable to #anchoring #bias?

Huang et al. generated the #SynAnchors dataset to find out.

Anchoring was more common in shallower layers of models.

A reflective reasoning strategy was usually most helpful.

doi.org/10.48550/arX...

#CogSci #AI #tech #edu