Tracing Undesirable LLM Behavior with Representation Gradient Analysis
Representation Gradient Tracing maps activation gradients to trace training data behind harmful, backdoor or outdated LLM outputs. First posted 26 September 2025. getnews.me/tracing-undesirable-llm-... #representationgradienttracing #llmsafety
0
0
0
0