Advertisement · 728 × 90

Posts by Thomas Frick

Preview
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs Despite recent successes, test-time scaling - i.e., dynamically expanding the token budget during inference as needed - remains brittle for vision-language models (VLMs): unstructured chains-of-though...

Check out our work here:
arxiv.org/abs/2602.06566

@niccoloav.bsky.social @matrig.net

2 months ago 0 1 0 0

This separation unlocks powerful capabilities:
✨ Scale "looking" independently of "thinking"
✨ Keep contexts lean — only process relevant crops
✨ Train the "eyes" without retraining the "brain" on a single GPU

2 months ago 0 0 1 0

Our method SPARC explicitly decouples the "Where" (perception) from the "Why" (reasoning) — mimicking how the brain separates early visual processing from executive function.

🔍 First: aggressive visual search to find the right pixels
🧠 Then: focused reasoning on only the relevant crops

2 months ago 0 0 1 0

Even the most brilliant detective can't solve a case without finding the clues first.

Yet most "thinking" VLMs make this exact mistake: they entangle visual search and complex logic into one giant, expensive chain of thought.

Stop burning through tokens in the dark. Ignite a SPARC. ⚡

2 months ago 4 1 1 0

Thanks! Love to be on the list!

1 year ago 1 0 1 0

👋🏻

1 year ago 1 0 0 0