Advertisement · 728 × 90

Posts by Agneet Chatterjee

Preview
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models Text-to-Image (T2I) and multimodal large language models (MLLMs) have been adopted in solutions for several computer vision and multimodal learning tasks. However, it has been found that such vision-l...

We also develop a benchmark to evaluate spatial understanding of VLM's. The core idea is to use synthetic images which avoids any possibility of test time leakage: arxiv.org/abs/2408.02231

1 year ago 1 0 0 0

@csprofkgd.bsky.social could you add me too? Thank you!

1 year ago 1 0 1 0