LLM's response traits—rather than #LLMpersona- drift during interactions, especially in emotionally charged contexts, leading to harmful behaviors. Authors propose activation capping along a learned "Assistant trait" axis & prompt steering to stabilize response traits.
arxiv.org/pdf/2601.10387
0
0
1
0