This work couldn’t be more urgent. We need better measurement practices in AI evaluation — asap. Here, we aim to clarify and inform, and show what better looks like for accuracy metrics and confidence estimates, with bonuses such as deeper evaluation understanding. Excellent work, team!
2 months ago
1
0
0
0