Guillaume Astruc (@gastruc) Bsky

UNIGEOCLIP: Unified Geospatial Contrastive Learning UNIGEOCLIP aligns text, aerial imagery, street-level views, DSMs, and geographic coordinates in a single unified embedding space via all-to-all contrastive learning.

A huge thank you to the amazing team at Google Geo for welcoming me and making this internship such a nice experience!
Dive into the paper and code here:
📄 Paper: arxiv.org/pdf/2604.11668
💻 Code: github.com/gastruc/unig...
🌐 Project: gastruc.github.io/unigeoclip

5 days ago 1 0 0 0

UniGeoCLIP is a powerful pretext task for backbones. Aligning these modalities builds features that excel at:
✅ Classif/Seg (aerial/DSM)
✅ Zero-shot cross-modal retrieval (generalizes to unseen cities like Amsterdam! 🇳🇱)

5 days ago 0 0 1 0

GPS embeddings don't just encode location—they learn deep semantics. 🧠

We prove this by predicting health, social & environmental indicators via the GPS encoder alone. PCA reveals dense urban structures (parks, zones) purely from coordinate space.

5 days ago 0 0 1 0

How do you encode location? We moved beyond standard RFF/SIREN baselines.
Our Multi-scale Lat-Long Encoder is specifically scaled to capture everything from local street blocks to regional geography using cross-scale self-attention. It’s the key to our SOTA downstream performance

5 days ago 0 0 1 0

Most models treat satellite, street-view, and GPS as separate silos.
UniGeoCLIP uses an all-to-all contrastive alignment to jointly link: Aerial 🛰️, Street-level 📸, Elevation (DEM) 🏔️, Text 📝, GPS 📍
More modalities = better representations.

5 days ago 2 0 1 0

Excited to share my work as a Student Researcher at Google Zurich: UniGeoCLIP! 🌍🚀

W/ Eduard Trulls, Jan Hosang, @loicland.bsky.social
& @pesarlin.bsky.social , we built a framework aligning 5 geospatial modalities in one space.

Presented at EarthVision @ #CVPR2026. 🧵👇

5 days ago 11 5 1 0

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a comp...

🚨 arxiv.org/abs/2604.06129

PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

This paper is the result of doing a lab-wide hackathon on an idea I've had for some time. Probably the paper with the highest number of authors I've ever done.

It's a CVPR Findings 26.

Thread 🧵👇

1 week ago 58 18 4 2

We introduce MIRO: a new paradigm for T2I model alignment integrating reward conditioning into pretraining, eliminating the need for separate fine-tuning/RL stages. This single-stage approach offers unprecedented efficiency and control.

- 19x faster convergence ⚡
- 370x less FLOPS than FLUX-dev 📉

5 months ago 61 14 3 5

Super interesting to see pure SSL outperforms text alignement on a super competitive but text-aligned suited task 🤯

8 months ago 2 0 0 0

🛰️ At #CVPR2025 presenting "AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities" - Saturday afternoon, Poster 355!
If you're here and want to discuss geolocation or geospatial foundation models, let's connect!

10 months ago 13 3 0 0

FLAIR-HUB: Large-scale Multimodal Dataset for Land Cover and Crop Mapping The growing availability of high-quality Earth Observation (EO) data enables accurate global land cover and crop type monitoring. However, the volume and heterogeneity of these datasets pose major pro...

📢 FLAIR-HUB dataset
A new large-scale, multimodal dataset for land cover and crop type mapping
🤗 Dataset: huggingface.co/datasets/IGN...
📄 Preprint: arxiv.org/abs/2506.07080
🤗 Pretrained models: huggingface.co/collections/...
💻 Code: github.com/IGNF/FLAIR-HUB
🌐 Project : arxiv.org/abs/2506.07080

10 months ago 18 9 1 0

I will be presenting our work on the detection of archaeological looting with satellite image time series at CVPR 2025 EarthVision workshop tomorrow!

Honored and grateful that this paper received the best student paper award!

10 months ago 15 6 1 0

When majority rules, minority loses: bias amplification of gradient descent Despite growing empirical evidence of bias amplification in machine learning, its theoretical foundations remain poorly understood. We develop a formal framework for majority-minority learning tasks, ...

📢 New preprint!
“When majority rules, minority loses: bias amplification of gradient descent”

We often blame biased data but training also amplifies biases. Our paper explores how ML algorithms favor stereotypes at the expense of minority groups.

➡️ arxiv.org/abs/2505.13122

(1/3)

10 months ago 3 2 1 0

We've added new experiments demonstrating robust generalization capabilities! Notably, AnySat shows strong performance on HLS Burn Scars - a sensor never seen during pretraining! 🔥🛰️
Check it out:
📄 Paper: arxiv.org/abs/2412.14123
🌐 Project: gastruc.github.io/anysat

11 months ago 9 3 0 0

Looking forward to #CVPR2025! We will present the following papers:

11 months ago 28 7 1 1

The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generation Bi-temporal change detection at scale based on Very High Resolution (VHR) images is crucial for Earth monitoring. This remains poorly addressed so far: methods either require large volumes of annotate...

Introducing HySCDG #CVPR2025, a generative pipeline for creating a large hybrid semantic change detection for Earth Observation using Stable Diffusion and ControlNet ! 🗺️🛩️

📄 Paper: arxiv.org/abs/2503.15683

11 months ago 14 5 1 0

💻We've released the code for our #CVPR2025 paper MAtCha!

🍵MAtCha reconstructs sharp, accurate and scalable meshes of both foreground AND background from just a few unposed images (eg 3 to 10 images)...

...While also working with dense-view datasets (hundreds of images)!

1 year ago 39 16 4 1

🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big 🧵👇 with details!

1 year ago 136 52 7 11

Starter pack including some of the lab members: go.bsky.app/QK8j87w

1 year ago 24 11 0 1

🧩 Excited to share our paper "RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges" (arxiv.org/abs/2502.19955) accepted to #CVPR2025! We created a benchmark that systematically evaluates image matching methods across well-defined geometric difficulty levels. 🔍

1 year ago 19 7 2 0

Weights for CAD are finally available. It's one of the smallest diffusion models on the market, achieving performance close to SD and Pixart, featuring a Perceiver-like architecture.
We leverage our coherence aware training to improve the textual understanding

1 year ago 11 3 0 0

🔗 Check it out:
📜 Paper: arxiv.org/abs/2412.14123
🌐 Project: gastruc.github.io/anysat
🤗 HuggingFace: huggingface.co/g-astruc/Any...
🐙 GitHub: github.com/gastruc/AnySat

1 year ago 5 0 0 0

🚀 Even better: AnySat supports linear probing for semantic segmentation!
That means you can fine-tune just a few thousand parameters and achieve SOTA results on challenging tasks—all with minimal effort.

1 year ago 3 0 1 0

AnySat achieves SOTA performance on 6 tasks across 10 datasets:
🌱 Land cover mapping
🌾 Crop type segmentation
🌳 Tree species classification
🌊 Flood detection
🌍 Change detection

1 year ago 2 0 1 0

We trained AnySat on 5 multimodal datasets simultaneously:
📡 11 distinct sensors
📏 Resolutions: 0.2m–500m
🔁 Revisit: single date to weekly
🏞️ Scales: 0.3–150 hectares

The pretrained model can adapt to truly diverse data, and probably yours too!

1 year ago 2 0 1 0

🔍Thanks to our modified JEPA training scheme and scale-adaptive spatial encoders, AnySat trains on datasets with diverse scales, resolutions, and modalities!
🧠 75% of its parameters are shared across all inputs, enabling unmatched flexibility.

1 year ago 3 0 1 0

🤔 What if embedding multimodal EO data was as easy as using a ResNet on images?
Introducing AnySat: one model for any resolution (0.2m–250m), scale (0.3–2600 hectares), and modalities (choose from 11 sensors & time series)!
Try it with just a few lines of code:

1 year ago 35 10 2 2

Guillaume Astruc, Nicolas Gonthier, Clement Mallet, Loic Landrieu
AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities
https://arxiv.org/abs/2412.14123

1 year ago 6 3 0 0

⚠️Reconstructing sharp 3D meshes from a few unposed images is a hard and ambiguous problem.

☑️With MAtCha, we leverage a pretrained depth model to recover sharp meshes from sparse views including both foreground and background, within mins!🧵

🌐Webpage: anttwo.github.io/matcha/

1 year ago 38 11 4 1

Posts by Guillaume Astruc