Advertisement · 728 × 90
#
Hashtag
#StrategicReflectivism
Advertisement · 728 × 90
"The overall framework of Metacognitive Behavioral Tuning (MBT). We employ two complementary strategies for behavior injection: MBT-S synthesizes rigorous traces from scratch, while MBT-R rewrites the student’s initial traces to stabilize intrinsic exploration. The model then internalizes these behaviors via Supervised Fine-Tuning (SFT), followed by Group Relative Policy Optimization (GRPO) to enhance reasoning robustness."

"MBT-R injects metacognitive behaviors by restructuring the initial reasoning traces generated by the student model. ... This rewriting process addresses both correct and incorrect initial traces. When the initial trace p i leads to an incorrect answer, the teacher does not simply discard the text. Instead, it simulates a metacognitive self-correction process, where the reasoning path explicitly identifies the flaw via a monitoring step, rejects the erroneous logic, and naturally redirects the trajectory toward the correct solution...."

"The overall framework of Metacognitive Behavioral Tuning (MBT). We employ two complementary strategies for behavior injection: MBT-S synthesizes rigorous traces from scratch, while MBT-R rewrites the student’s initial traces to stabilize intrinsic exploration. The model then internalizes these behaviors via Supervised Fine-Tuning (SFT), followed by Group Relative Policy Optimization (GRPO) to enhance reasoning robustness." "MBT-R injects metacognitive behaviors by restructuring the initial reasoning traces generated by the student model. ... This rewriting process addresses both correct and incorrect initial traces. When the initial trace p i leads to an incorrect answer, the teacher does not simply discard the text. Instead, it simulates a metacognitive self-correction process, where the reasoning path explicitly identifies the flaw via a monitoring step, rejects the erroneous logic, and naturally redirects the trajectory toward the correct solution...."

"Metacognitively Grounded Reasoning Trace Rewriting Prompt (MBT-R)

"....If the draft solution contains errors or leads to an incorrect conclusion, you must naturally steer the reasoning process toward the correct answer by simulating a realization of error and self-correction, rather than simply stating the right answer."

"Formatting Constraints: ... Do not mention the existence of the ... 'correct answer.' The output must read as a single, authentic, independent thought process."

"Metacognitively Grounded Reasoning Trace Rewriting Prompt (MBT-R) "....If the draft solution contains errors or leads to an incorrect conclusion, you must naturally steer the reasoning process toward the correct answer by simulating a realization of error and self-correction, rather than simply stating the right answer." "Formatting Constraints: ... Do not mention the existence of the ... 'correct answer.' The output must read as a single, authentic, independent thought process."

"Accuracy-Efficiency Score (AES) To capture the tradeoff between improving accuracy and reducing computational cost in a single scalar,"

"Overthinking Score ( ξ OT) measures post-solution stagnation (e.g., redundant verification or baseless doubt) on correct samples"

"Underthinking Score ( ξ UT) measures pre-solution stagnation (e.g., structural loops or lack of progress) on incorrect samples"

"Metacognition Score [ranges] from 0 to 5 based on how actively the model implements key metacognitive behaviors, regardless of the answer’s correctness."

"Accuracy-Efficiency Score (AES) To capture the tradeoff between improving accuracy and reducing computational cost in a single scalar," "Overthinking Score ( ξ OT) measures post-solution stagnation (e.g., redundant verification or baseless doubt) on correct samples" "Underthinking Score ( ξ UT) measures pre-solution stagnation (e.g., structural loops or lack of progress) on incorrect samples" "Metacognition Score [ranges] from 0 to 5 based on how actively the model implements key metacognitive behaviors, regardless of the answer’s correctness."

"Figure A1. Comparison of Accuracy-Efficiency Score (AES) across behavior injection strategies on the MuSiQue dataset. While rewritingbased baselines (Direct-R, Distill-R) improve over Prompting, MBT-S and MBT-R achieve the highest AES. The result demonstrates that explicitly synthesizing or rewriting traces tailored to the student’s learning needs (MBT) is more effective than simply rewriting external teacher traces."

"Figure A1. Comparison of Accuracy-Efficiency Score (AES) across behavior injection strategies on the MuSiQue dataset. While rewritingbased baselines (Direct-R, Distill-R) improve over Prompting, MBT-S and MBT-R achieve the highest AES. The result demonstrates that explicitly synthesizing or rewriting traces tailored to the student’s learning needs (MBT) is more effective than simply rewriting external teacher traces."

As recommended in #StrategicReflectivism (doi.org/10.48550/arX...), #AI models can increase efficiency by tactically reflecting on initial answers.

There are a few ways to do this with #LLMs.

Kim et al. recently tested "metacognitive behavioral tuning": doi.org/10.48550/arX...

4 2 2 0
"Abstract—Large Language Models (LLMs) face a fundamental challenge in deciding when to rely on rapid, intuitive responses versus engaging in slower, more deliberate reasoning. Inspired by Daniel Kahneman’s dual-process theory and his insights on human cognitive biases, we propose a novel Cognitive Decision Routing (CDR) framework that dynamically determines the appropriate reasoning strategy based on query characteristics. Our approach addresses the current limitations where models either apply uniform reasoning depth or rely on computationally expensive methods for all queries. We introduce a meta-cognitive layer that analyzes query complexity through multiple dimensions: correlation strength between given information and required conclusions, domain boundary crossings, stakeholder multiplicity, and uncertainty levels. Through extensive experiments on diverse reasoning tasks, we demonstrate that CDR achieves superior performance while reducing computational costs by 34% compared to uniform deep reasoning approaches. Our framework shows particular strength in professional judgment tasks, achieving 23% improvement in consistency and 18% better accuracy on expert-level evaluations. This work bridges cognitive science principles with practical AI system design, offering a principled approach to adaptive reasoning in LLMs."

"Abstract—Large Language Models (LLMs) face a fundamental challenge in deciding when to rely on rapid, intuitive responses versus engaging in slower, more deliberate reasoning. Inspired by Daniel Kahneman’s dual-process theory and his insights on human cognitive biases, we propose a novel Cognitive Decision Routing (CDR) framework that dynamically determines the appropriate reasoning strategy based on query characteristics. Our approach addresses the current limitations where models either apply uniform reasoning depth or rely on computationally expensive methods for all queries. We introduce a meta-cognitive layer that analyzes query complexity through multiple dimensions: correlation strength between given information and required conclusions, domain boundary crossings, stakeholder multiplicity, and uncertainty levels. Through extensive experiments on diverse reasoning tasks, we demonstrate that CDR achieves superior performance while reducing computational costs by 34% compared to uniform deep reasoning approaches. Our framework shows particular strength in professional judgment tasks, achieving 23% improvement in consistency and 18% better accuracy on expert-level evaluations. This work bridges cognitive science principles with practical AI system design, offering a principled approach to adaptive reasoning in LLMs."

"we propose four dimensions for assessing whether queries require deeper reasoning:

Correlation Strength: Measures the statistical relationship between given information and required conclusions. Low correlation suggests intuitive responses may be unreliable.

Domain Crossing: Identifies when reasoning spans multiple knowledge domains, increasing the likelihood of inappropriate generalization.

Stakeholder Multiplicity: Counts the number of parties affected by the decision, reflecting loss aversion and conflict complexity.

Uncertainty Level: Measures the degree of expert disagreement or inherent ambiguity in the problem domain."

"we propose four dimensions for assessing whether queries require deeper reasoning: Correlation Strength: Measures the statistical relationship between given information and required conclusions. Low correlation suggests intuitive responses may be unreliable. Domain Crossing: Identifies when reasoning spans multiple knowledge domains, increasing the likelihood of inappropriate generalization. Stakeholder Multiplicity: Counts the number of parties affected by the decision, reflecting loss aversion and conflict complexity. Uncertainty Level: Measures the degree of expert disagreement or inherent ambiguity in the problem domain."

"Our CDR framework consists of three main components (Figure 1):

Query Analyzer: Processes input queries to extract the four dimensional features. This component uses specialized classifiers trained on annotated datasets for each dimension.

Routing Decision Module: Combines dimensional features to determine the appropriate reasoning strategy. We experiment with both rule-based and learned approaches.

Adaptive Reasoning Engine: Executes either fast intuitive responses or structured slow reasoning based on the routing decision."

"Our CDR framework consists of three main components (Figure 1): Query Analyzer: Processes input queries to extract the four dimensional features. This component uses specialized classifiers trained on annotated datasets for each dimension. Routing Decision Module: Combines dimensional features to determine the appropriate reasoning strategy. We experiment with both rule-based and learned approaches. Adaptive Reasoning Engine: Executes either fast intuitive responses or structured slow reasoning based on the routing decision."

"...Cognitive Decision Routing (CDR) significantly outperforms all baselines: accuracy improvement of 2.5 percentage points over Uniform Slow (t(499) = 3.42, p < 0.001), consistency improvement of 0.10 (t(499) = 4.18, p < 0.001), and token reduction of 34% (t(499) = -8.73, p < 0.001)."

"...Cognitive Decision Routing (CDR) significantly outperforms all baselines: accuracy improvement of 2.5 percentage points over Uniform Slow (t(499) = 3.42, p < 0.001), consistency improvement of 0.10 (t(499) = 4.18, p < 0.001), and token reduction of 34% (t(499) = -8.73, p < 0.001)."

More reason for #StrategicReflectivism:

Adding a "routing ...module" to an #LLM pipeline (to choose either fast/standard inference or slow, chain-of-thought inference) improved reasoning about #insurance, #medicine, #business, #policy, etc.

And at lower cost!

doi.org/10.48550/arX...

#cogSci #AI

0 0 0 0

I have finally started writing the #StrategicReflectivism paper, thanks in large part to results like the one below.

A sketch of the view: crucial to intelligence (human or otherwise) is pragmatic switching between intuitive and reflective reasoning according to the goals of the intelligent system.

1 0 0 0