#tracemonitor hashtag - Bluesky - nopzon.com

Bluesky Explorer

#

Hashtag

#tracemonitor

@getnews-me.bsky.social

5 months ago

Detecting Implicit Reward Hacking by Measuring Model Reasoning Effort

Detecting Implicit Reward Hacking by Measuring Model Reasoning Effort

TRACE measures reasoning effort by truncating CoTs. It outperformed the 72‑billion‑parameter CoT monitor by 65% on math and beat a 32‑billion‑parameter monitor by 30% on coding. getnews.me/detecting-implicit-rewar... #tracemonitor #rewardhacking

0 0 0 0