Gabriele Goletto (@gabrigole) Bsky

Preprint now on ArXiv 📢
The N-Body Problem: Parallel Execution from Single-Person Egocentric Video
Input: Single-person egocentric video 👤
Out: imagine how these tasks can be performed faster by N > 1 people, correctly e.g. N=2 👥
📎 arxiv.org/abs/2512.11393
👀 zhifanzhu.github.io/ego-nbody/
1/4

4 months ago 7 6 1 1

Yes please! The animations look really clear to me so it would be a great learning resource with voiceover 🙏

11 months ago 2 0 1 0

Now on ArXiv our
@cvprconference.bsky.social
#CVPR2025 paper
Learning from Streaming Video with Orthogonal Gradients
Instead of shuffling clips, can we learn from videos fed sequentially, where you see a clip once, in order?
How to deal with the correlation of gradients over training?
1/3

1 year ago 17 2 1 0

But I like the (almost) bot-free conversations and there are some really good active accounts!

1 year ago 1 0 0 0

Check out Kosta’s starter packs (go.bsky.app/M7HGC3Y), that’s the fastest route. That said, unfortunately, the CV community here has become less active compared to a few months ago.

1 year ago 1 0 1 0

Image segmentation doesn’t have to be rocket science. 🚀
Why build a rocket engine full of bolted-on subsystems when one elegant unit does the job? 💡
That’s what we did for segmentation.
✅ Meet the Encoder-only Mask Transformer (EoMT): tue-mps.github.io/eomt (CVPR 2025)
(1/6)

1 year ago 8 4 1 1

Excited to release the first worldwide aerial image localization method (and demo!)
Take an aerial or satellite image from anywhere in the world, and AstroLoc can (probably) find its location, and provide a precise footprint!
Links to paper, demo and full-length (5 min) video ⬇️

1 year ago 9 1 1 0

🛑📢
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
hd-epic.github.io
arxiv.org/abs/2502.04144
New collected videos
263 annotations/min: recipe, nutrition, actions, sounds, 3D object movement &fixture associations, masks.
26K VQA benchmark to challenge current VLMs
1/N

1 year ago 34 6 2 4

Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵

1 year ago 18 3 1 1

Hi Kosta, would love to be on this list as well 😊 I am working on egocentric video understanding

1 year ago 1 0 0 0

Posts by Gabriele Goletto