Yuki Asano (@yukimasano) Bsky

Pretrained ViTs usually come in rigid sizes (S, B, L, H). But your hardware constraints don't

We built a way to make DINO or CLIP fully elastic in <5 mins without any retraining ⚡️

Get the exact model size you need, not just what was released

Find Walter at #NeurIPS Poster 4709 | Thu 4:30-7:30 PM

4 months ago 5 2 1 0

On the occasion of the 1000th citation of our Sinkhorn-Knopp self-supervised representation learning paper, I've written a whole post about the history and the key bits of this method that powers the state-of-the-art SSL vision models.

Read it here :): docs.google.com/document/d/1...

6 months ago 22 5 1 0

Today, we release Franca, a new vision Foundation Model that matches and often outperforms DINOv2.
The data, the training code and the model weights are open-source.

This is the result of a close and fun collaboration
@valeoai.bsky.social (in France) and @funailab.bsky.social (in Franconia)🚀

9 months ago 21 4 0 0

Agreed, very interesting! Future engines that run on information? 🤯

1 year ago 7 0 0 0

Our Lab is now also on bsky! 🥳

1 year ago 21 1 2 0

🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.

1/7

1 year ago 69 21 1 5

https://tinyurl.com/BristolCVLectureship

Pls RT
Permanent Assistant Professor (Lecturer) position in Computer Vision @bristoluni.bsky.social [DL 6 Jan 2025]
This is a research+teaching permanent post within MaVi group uob-mavi.github.io in Computer Science. Suitable for strong postdocs or exceptional PhD graduates.
t.co/k7sRRyfx9o
1/2

1 year ago 22 14 1 1

Today we had a joint workshop between our FunAI Lab, UTN and AIST Japan. 13 talks, 1 cake and lots of Bavarian food really get research discussions going!
Towards more collaborations in AI between 🇩🇪 & 🇯🇵.
@hirokatukataoka.bsky.social

1 year ago 6 1 0 2

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retri...

Thanks for tagging. In addition have a look at NV-Embed paper: arxiv.org/abs/2405.17428 they do contrastive finetuning after turning on the bidirectional attention mask

1 year ago 3 0 1 0

Also @phdcomics.bsky.social is on 🦋 👏. slowly nesting here.

1 year ago 4 0 0 0

Yay, @xkcd.com is on 🦋

1 year ago 2 0 0 0

Nice 👏! We love small (M)LLMs :) will training code also be released?

1 year ago 3 1 0 0

Do better language models have crisper vision? How well do text-only Large Language Models (LLMs) grasp the visual world? As LLMs are increasingly used in computer vision, addressing this question becomes both fundamental and pertinent. However, e...

and also perhaps interesting for you: probing text-representations of LLMs for CLIP-like zero-shot classification: arxiv.org/abs/2410.07173

1 year ago 3 0 0 0

Sam next to his poster; I'm still very impressed he did all this for his MSc thesis! #BMVC2024

1 year ago 5 1 0 0

exactly. hence the new post-(pre)training term perhaps? post-training seems to be a good generic term for the RLHF/preference tuning etc in NLP allenai.org/papers/tulu-.... so by saying post-pretraining, we could emphasize the fact it's unsupervised

1 year ago 1 0 0 0

"Post-pretraining", "unsupervised domain adaptation" fits, but I think is used for different tasks

1 year ago 2 0 2 0

Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains. However, with mode...

This work was led by Jochem Loedeman in his MSc, and supervised by Maarten Stol, Tengda Han and myself.
📓: arxiv.org/abs/2210.06466 
Visit BMVC poster 532 at 10am today!

1 year ago 8 1 0 0

This means we can simply send an adapted RGB image to the server to get a personalised output.
We also show that the gains don't just come from adding a new learnable model, but instead from the interplay between the pretrained one and the PGN.

1 year ago 2 0 1 0

This CNN (e.g. running on a phone) outputs a softmax over a set of learned tokens. These are then combined and used for the adaptation. This allows efficient learning, but also for moving the signal back into pixel-space via pseudo-inverse.

1 year ago 1 0 1 0

Also known as reprogramming, works from @phillipisola.bsky.social showed that even adjusting singular pixels allows adapting a model. We take this one step further and make the input-only adaptation signal dependent on the image itself: We introduce a lightweight CNN, the Prompt Generation Network.

1 year ago 4 0 2 0

LoRA is great but one disadvantage is that if you have 1000s of these adapters and want to serve them in an efficient way, it's very difficult: GPUs are inefficient when you e.g. use one adapter for only one sample in a large batch. The solution is to adapt the model strictly in input-space.

1 year ago 3 0 1 0

LoRA et al. enable personalised model generation and serving, which is crucial as finetuned models still outperform general ones in many tasks. However, serving a base model with many LoRAs is very inefficient! Now, there's a better way: enter Prompt Generation Networks, presented today #BMVC

1 year ago 31 5 1 0

Hello world!
Is there any tool to sync twitter and bluesky posting?

1 year ago 3 0 1 0

My growing list of #computervision researchers on Bsky.

Missed you? Let me know.

go.bsky.app/M7HGC3Y

1 year ago 131 42 88 9

Sky Follower Bridge - Chrome Web Store Instantly find and follow the same users from your Twitter follows on Bluesky.

The thingie that brings over your twitter followers worked jolly well for me. Very cool! I am following another 500 people now thanks to that…
chromewebstore.google.com/detail/sky-f...

1 year ago 148 10 13 4

Posts by Yuki Asano