You wouldn’t believe how long it took me to convince Claude dino.txt is a thing…
Posts by Klara Janouskova
And the zoo 😍
🐾 He has been the goodest boy for a year already! Time for his first 🐑
Yes
Claude has started to give me not so subtle tips on how I should now show the report we just created to Jiri (my PhD supervisor’s first name) and so on. I was really confused at first, then I realized I had it work on my personal website and cv last week so he remembered it 🙈😅
I am afraid I will need to restart my hunt for the perfect scone recipe. Or just going to the UK might also work 😋
Two #CVPR2026 competitions are live: S23DR 2026 and BuildingWorld 2026!
Task: reconstruct house roof wireframes from point clouds and segmentations:
Total prize fund: $22k
Deadline: end of May 2026
1) huggingface.co/spaces/usm3d...
2) huggingface.co/spaces/Build...
@cvprconference.bsky.social
Multimodal Large Language Models as Image Classifiers
Nikita Kisel, Illia Volkov @klara-cz.bsky.social Jiri Matas
tl;dr: if you evaluate good (chatGPT) model on a dirty (ImageNet) test set, it is bad. Yes, ImageNet test is bad nowadays. +insights from labeling.
arxiv.org/abs/2603.065...
I am glad somebody has appreciated it! 🐈
I am not gonna lie, I tried to have my dog there at first, but despite ImageNet's over 100 classes being dog breeds, they still somehow managed not to squeeze the australian shepherd in.
To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. 📈
Work with Nikita Kisel, Illia Volkov and Jiri Matas, to be presented at #CVPR26, findings!
🤗 Finally, we show that these models aren’t just affected by annotation quality; they can help fix it. In a controlled verification study, annotators integrated model predictions in roughly half of the difficult cases, suggesting MLLMs can be useful tools for large-scale dataset curation.
To study this, we introduce ReGT, a new multilabel reannotation of 625 ImageNet classes that corrects many of these issues. When evaluated on the cleaned labels, multimodal LLMs improve by up to +10.8% accuracy, substantially narrowing the gap with supervised vision models. 📈
We show that small changes in evaluation protocol, like choice of distractors, output mapping, even image order, significantly impact accuracy.
⚠️ But there’s a deeper issue: the data. ImageNet contains a lot of label noise, so even a perfect eval. protocol may not give a meaningful result.
Let me introduce our new paper: Multimodal Large Language Models as Image Classifiers
❓ Multimodal LLMs are increasingly used for visual tasks, but evaluating their image classification ability has produced conflicting conclusions.
Link: arxiv.org/html/2603.06...
He totally does, he is getting more snuggly every day ☺️
Morning walks 🐾
It also really does feel like reviewer psychology since they have not explicitly pointed it out as the issue - not being able to run the experiment again with different framing but same reviewers is tough :D
When you re-read the introduction of your freshly rejected paper that was somewhat rushed before the deadline and you are like: Ok, this is why. 🥲
Team 2/2 rejected, with one suggested for the findings workshop.
I am a bit sad because I feel they were rejected for the wrong reasons + I am tired of getting BR rating with no suggestions for rebuttal, but I am much more into ECCV than CVPR this year anyway. 😁
Good luck with resubmission! 🍀
I feel like for the first time in my (short) reviewing career, I may have helped a (IMO of course) nice paper get accepted despite other reviewer(s).
1/n Attention, Please! 🚀
Our work “Revisiting Attentive Probing Through the Lens of Efficiency” has been accepted at #ICLR2026.
We introduce Efficient Probing (EP) — a lightweight, multi-query attentive probing method for frozen encoders.
Paper + code at the end 👇
I was starting to wonder what do I do with my time now
Oh ok, that is a different level of wrong than I thought 🥲
I think most benchmarks are pretty noisy; it is just that for some (say ImageNet :)), enough people actually looked at the images and noticed.
To be fair, data annotation is HARD. I do agree people should at least try to do a better job and be responsive, of course :)
bsky.app/profile/klar...
What a beautiful day to be done with all deadlines! ☃️
This was my WFH lunch break today, if it is not clear why I do not live in Prague 😁
25 % left, a few more nice bedtime readings for me. :)
JAZZ HANDS!
I am currently at
R: Be very ready.
G: I am very ready. Be calm.
R: Am calm. You be calm.
G: NO YOU BE-
Having stopped about midway through Project Hail Mary and forbidding myself to resume until I finish my CVPR reviews was a pretty good motivation.
Also, if you have not read it yet, but you think you might enjoy it, go for it, you are in for a treat (and the movie is coming)! 🤓
I should have added it looks like this (a few lucky days a year 🤣), like today ☃️☺️