We also develop a benchmark to evaluate spatial understanding of VLM's. The core idea is to use synthetic images which avoids any possibility of test time leakage: arxiv.org/abs/2408.02231
1 year ago 1 0 0 0
@csprofkgd.bsky.social could you add me too? Thank you!