TrustJudge Reduces Evaluation Inconsistencies in LLM-as-a-Judge Systems
TrustJudge lowers score‑comparison inconsistency by 8.43% and pairwise transitivity errors by 10.82% using distribution‑sensitive scoring and likelihood‑aware aggregation. getnews.me/trustjudge-reduces-evalu... #trustjudge #llmevaluation
0
0
0
0