AI agents keep getting better at math and reasoning, or do they?
I ran a straightforward and revealing test: how well do today’s mainstream AI agents solve Calcudoku puzzles?
I benchmarked 10 agents.
Results surprised me 👇
www.calcudoku.org/papers/ai_ag...
#AI #LLMs #AIEvaluation #Calcudoku
Version française de Calcudoku lancée ! Puzzles quotidiens gratuits.
😀🇫🇷
calcudoku.org/fr
Avis ?
#sudoku #puzzleslogiques #jeuxdelogique #calcudoku #Noël #Shein #PSG
A How to Solve video of the "best wishes for 2025" puzzle I posted a month ago 😎
www.youtube.com/watch?v=mBul...
#puzzle #puzzletime #coffeetime #saturday #indiedev #calcudoku #math #fbi #logicpuzzle #sudoku #followme
Finally a #dark mode at www.calcudoku.org 😀
🌜🌜🌜
#calcudoku #kenken #puzzle #indiedev
.. some tag surfing: #fbi #cybermonday #playstation #doj
2nd #bluesky post 😀
A surely doable 6x6 Calcudoku 😀
Each row and column has the numbers 1-6 exactly once.
Single cells are givens,
double cells also show an operation: when applied to the 2 numbers, it should produce the shown result.
#calcudoku #kenken #sudoku #logic #math
#logicpuzzle #puzzle