Matthias Samwald (@matthiassamwald) Bsky

Posts by Matthias Samwald

🚨 NeurIPS 2024 Spotlight
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? 🤯 Enter BetterBench–our framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x

1 year ago 139 25 4 7

I'd rather follow you here than on X!

1 year ago 1 0 0 0