Advertisement · 728 × 90

Posts by Joel Burget

Do you all have plans for how multimodal would work? Treating an image as a sequence of bytes (the rows in a bitmap or something) seems pretty bad since it throws away so much structure. Reverting to tokens is ugly. You probably want to learn the encoding but it's not clear how.

1 year ago 0 0 0 0