Overview of PixMo and its relation to Molmo's ability. PixMo's captions data enables Molmo's fine-grained understanding; PixMo's AskModelAnything enables Molmo's user interaction; PixMo's pointing data enables Molmo's pointing and counting; PixMo's synthetic data enables Molmo's visual skills.
Remember Molmo? The full recipe is finally out!
Training code, data, and everything you need to reproduce our models. Oh, and we have updated our tech report too!
Links in thread π