This is joint work with @aecker.bsky.social
Posts by Timo Lüddecke
More information (with additional results on DinoV3, SigLIP2 and Perception Encoder):
📄 Paper (in TMLR): openreview.net/forum?id=neM...
📊 Website: eckerlab.org/projects/deap/
💻 Code: github.com/timojl/deap
…or drop by our poster at the ELLIS UnConference on December 2nd in Copenhagen. #EuRIPS
plot on relationship between performance and backbone properties
Based our performance data for all backbones, we analyze to which degree performance can be attributed to general properties of the backbone (input image resolution, feature dimension, number of parameters). We find strong relationships with all properties for semantic segmentation and depth.
relative performance-runtime plot
A closer look into the three instance awareness tasks (instance discrimination, instance boundary detection, object detection) reveals that self-supervised learning outperforms vision-language (CLIP-style) pretraining.
performance-runtime plots
We compare supervised, self-supervised and vision-language backbones with respect to instance awareness, local semantics and spatial understanding. Here we show the trade-off between forward pass runtime and local semantics and spatial understanding performance:
📣 Paper alert: We present dense attentive probing (DeAP), a method to measure the representation quality of various vision backbones for dense prediction tasks. It uses small, parameter-efficient readouts with learnable masks to generate dense predictions from backbone features of any size.