"The overall framework of Metacognitive Behavioral Tuning (MBT). We employ two complementary strategies for behavior injection: MBT-S synthesizes rigorous traces from scratch, while MBT-R rewrites the student’s initial traces to stabilize intrinsic exploration. The model then internalizes these behaviors via Supervised Fine-Tuning (SFT), followed by Group Relative Policy Optimization (GRPO) to enhance reasoning robustness." "MBT-R injects metacognitive behaviors by restructuring the initial reasoning traces generated by the student model. ... This rewriting process addresses both correct and incorrect initial traces. When the initial trace p i leads to an incorrect answer, the teacher does not simply discard the text. Instead, it simulates a metacognitive self-correction process, where the reasoning path explicitly identifies the flaw via a monitoring step, rejects the erroneous logic, and naturally redirects the trajectory toward the correct solution...."
"Metacognitively Grounded Reasoning Trace Rewriting Prompt (MBT-R) "....If the draft solution contains errors or leads to an incorrect conclusion, you must naturally steer the reasoning process toward the correct answer by simulating a realization of error and self-correction, rather than simply stating the right answer." "Formatting Constraints: ... Do not mention the existence of the ... 'correct answer.' The output must read as a single, authentic, independent thought process."
"Accuracy-Efficiency Score (AES) To capture the tradeoff between improving accuracy and reducing computational cost in a single scalar," "Overthinking Score ( ξ OT) measures post-solution stagnation (e.g., redundant verification or baseless doubt) on correct samples" "Underthinking Score ( ξ UT) measures pre-solution stagnation (e.g., structural loops or lack of progress) on incorrect samples" "Metacognition Score [ranges] from 0 to 5 based on how actively the model implements key metacognitive behaviors, regardless of the answer’s correctness."
"Figure A1. Comparison of Accuracy-Efficiency Score (AES) across behavior injection strategies on the MuSiQue dataset. While rewritingbased baselines (Direct-R, Distill-R) improve over Prompting, MBT-S and MBT-R achieve the highest AES. The result demonstrates that explicitly synthesizing or rewriting traces tailored to the student’s learning needs (MBT) is more effective than simply rewriting external teacher traces."
As recommended in #StrategicReflectivism (doi.org/10.48550/arX...), #AI models can increase efficiency by tactically reflecting on initial answers.
There are a few ways to do this with #LLMs.
Kim et al. recently tested "metacognitive behavioral tuning": doi.org/10.48550/arX...