Advertisement Β· 728 Γ— 90

Posts by Guillaume Astruc

UNIGEOCLIP: Unified Geospatial Contrastive Learning UNIGEOCLIP aligns text, aerial imagery, street-level views, DSMs, and geographic coordinates in a single unified embedding space via all-to-all contrastive learning.

A huge thank you to the amazing team at Google Geo for welcoming me and making this internship such a nice experience!
Dive into the paper and code here:
πŸ“„ Paper: arxiv.org/pdf/2604.11668
πŸ’» Code: github.com/gastruc/unig...
🌐 Project: gastruc.github.io/unigeoclip

5 days ago 1 0 0 0
Post image Post image Post image

UniGeoCLIP is a powerful pretext task for backbones. Aligning these modalities builds features that excel at:
βœ… Classif/Seg (aerial/DSM)
βœ… Zero-shot cross-modal retrieval (generalizes to unseen cities like Amsterdam! πŸ‡³πŸ‡±)

5 days ago 0 0 1 0
Post image Post image

GPS embeddings don't just encode locationβ€”they learn deep semantics. 🧠

We prove this by predicting health, social & environmental indicators via the GPS encoder alone. PCA reveals dense urban structures (parks, zones) purely from coordinate space.

5 days ago 0 0 1 0
Post image

How do you encode location? We moved beyond standard RFF/SIREN baselines.
Our Multi-scale Lat-Long Encoder is specifically scaled to capture everything from local street blocks to regional geography using cross-scale self-attention. It’s the key to our SOTA downstream performance

5 days ago 0 0 1 0
Post image

Most models treat satellite, street-view, and GPS as separate silos.
UniGeoCLIP uses an all-to-all contrastive alignment to jointly link: Aerial πŸ›°οΈ, Street-level πŸ“Έ, Elevation (DEM) πŸ”οΈ, Text πŸ“, GPS πŸ“
More modalities = better representations.

5 days ago 2 0 1 0
Post image

Excited to share my work as a Student Researcher at Google Zurich: UniGeoCLIP! πŸŒπŸš€

W/ Eduard Trulls, Jan Hosang, @loicland.bsky.social
& @pesarlin.bsky.social , we built a framework aligning 5 geospatial modalities in one space.

Presented at EarthVision @ #CVPR2026. πŸ§΅πŸ‘‡

5 days ago 11 5 1 0
Preview
PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a comp...

🚨 arxiv.org/abs/2604.06129

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

This paper is the result of doing a lab-wide hackathon on an idea I've had for some time. Probably the paper with the highest number of authors I've ever done.

It's a CVPR Findings 26.

Thread πŸ§΅πŸ‘‡

1 week ago 58 18 4 2
Post image

We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control.

- 19x faster convergence ⚑
- 370x less FLOPS than FLUX-dev πŸ“‰

5 months ago 61 14 3 5

Super interesting to see pure SSL outperforms text alignement on a super competitive but text-aligned suited task 🀯

8 months ago 2 0 0 0
Advertisement

πŸ›°οΈ At #CVPR2025 presenting "AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities" - Saturday afternoon, Poster 355!
If you're here and want to discuss geolocation or geospatial foundation models, let's connect!

10 months ago 13 3 0 0
Preview
FLAIR-HUB: Large-scale Multimodal Dataset for Land Cover and Crop Mapping The growing availability of high-quality Earth Observation (EO) data enables accurate global land cover and crop type monitoring. However, the volume and heterogeneity of these datasets pose major pro...

πŸ“’ FLAIR-HUB dataset
A new large-scale, multimodal dataset for land cover and crop type mapping
πŸ€— Dataset: huggingface.co/datasets/IGN...
πŸ“„ Preprint: arxiv.org/abs/2506.07080
πŸ€— Pretrained models: huggingface.co/collections/...
πŸ’» Code: github.com/IGNF/FLAIR-HUB
🌐 Project : arxiv.org/abs/2506.07080

10 months ago 18 9 1 0
Post image

I will be presenting our work on the detection of archaeological looting with satellite image time series at CVPR 2025 EarthVision workshop tomorrow!

Honored and grateful that this paper received the best student paper award!

10 months ago 15 6 1 0
Preview
When majority rules, minority loses: bias amplification of gradient descent Despite growing empirical evidence of bias amplification in machine learning, its theoretical foundations remain poorly understood. We develop a formal framework for majority-minority learning tasks, ...

πŸ“’ New preprint!
β€œWhen majority rules, minority loses: bias amplification of gradient descent”

We often blame biased data but training also amplifies biases. Our paper explores how ML algorithms favor stereotypes at the expense of minority groups.

➑️ arxiv.org/abs/2505.13122

(1/3)

10 months ago 3 2 1 0

We've added new experiments demonstrating robust generalization capabilities! Notably, AnySat shows strong performance on HLS Burn Scars - a sensor never seen during pretraining! πŸ”₯πŸ›°οΈ
Check it out:
πŸ“„ Paper: arxiv.org/abs/2412.14123
🌐 Project: gastruc.github.io/anysat

11 months ago 9 3 0 0

Looking forward to #CVPR2025! We will present the following papers:

11 months ago 28 7 1 1
Preview
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation Bi-temporal change detection at scale based on Very High Resolution (VHR) images is crucial for Earth monitoring. This remains poorly addressed so far: methods either require large volumes of annotate...

Introducing HySCDG #CVPR2025, a generative pipeline for creating a large hybrid semantic change detection for Earth Observation using Stable Diffusion and ControlNet ! πŸ—ΊοΈπŸ›©οΈ

πŸ“„ Paper: arxiv.org/abs/2503.15683

11 months ago 14 5 1 0
Post image Post image Post image Post image

πŸ’»We've released the code for our #CVPR2025 paper MAtCha!

🍡MAtCha reconstructs sharp, accurate and scalable meshes of both foreground AND background from just a few unposed images (eg 3 to 10 images)...

...While also working with dense-view datasets (hundreds of images)!

1 year ago 39 16 4 1
Post image Post image

πŸ”₯πŸ”₯πŸ”₯ CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) πŸ₯πŸΎπŸ₯–πŸ·

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big πŸ§΅πŸ‘‡ with details!

1 year ago 136 52 7 11

Starter pack including some of the lab members: go.bsky.app/QK8j87w

1 year ago 24 11 0 1
Advertisement
Post image

🧩 Excited to share our paper "RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges" (arxiv.org/abs/2502.19955) accepted to #CVPR2025! We created a benchmark that systematically evaluates image matching methods across well-defined geometric difficulty levels. πŸ”

1 year ago 19 7 2 0

Weights for CAD are finally available. It's one of the smallest diffusion models on the market, achieving performance close to SD and Pixart, featuring a Perceiver-like architecture.
We leverage our coherence aware training to improve the textual understanding

1 year ago 11 3 0 0
Post image

πŸ”— Check it out:
πŸ“œ Paper: arxiv.org/abs/2412.14123
🌐 Project: gastruc.github.io/anysat
πŸ€— HuggingFace: huggingface.co/g-astruc/Any...
πŸ™ GitHub: github.com/gastruc/AnySat

1 year ago 5 0 0 0
Post image

πŸš€ Even better: AnySat supports linear probing for semantic segmentation!
That means you can fine-tune just a few thousand parameters and achieve SOTA results on challenging tasksβ€”all with minimal effort.

1 year ago 3 0 1 0
Post image

AnySat achieves SOTA performance on 6 tasks across 10 datasets:
🌱 Land cover mapping
🌾 Crop type segmentation
🌳 Tree species classification
🌊 Flood detection
🌍 Change detection

1 year ago 2 0 1 0
Post image

We trained AnySat on 5 multimodal datasets simultaneously:
πŸ“‘ 11 distinct sensors
πŸ“ Resolutions: 0.2m–500m
πŸ” Revisit: single date to weekly
🏞️ Scales: 0.3–150 hectares

The pretrained model can adapt to truly diverse data, and probably yours too!

1 year ago 2 0 1 0
Post image

πŸ”Thanks to our modified JEPA training scheme and scale-adaptive spatial encoders, AnySat trains on datasets with diverse scales, resolutions, and modalities!
🧠 75% of its parameters are shared across all inputs, enabling unmatched flexibility.

1 year ago 3 0 1 0
Post image

πŸ€” What if embedding multimodal EO data was as easy as using a ResNet on images?
Introducing AnySat: one model for any resolution (0.2m–250m), scale (0.3–2600 hectares), and modalities (choose from 11 sensors & time series)!
Try it with just a few lines of code:

1 year ago 35 10 2 2

Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
https://arxiv.org/abs/2412.14123

1 year ago 6 3 0 0
Advertisement
Video

⚠️Reconstructing sharp 3D meshes from a few unposed images is a hard and ambiguous problem.

β˜‘οΈWith MAtCha, we leverage a pretrained depth model to recover sharp meshes from sparse views including both foreground and background, within mins!🧡

🌐Webpage: anttwo.github.io/matcha/

1 year ago 38 11 4 1