Martin Gauch (@gauchm) Bsky

A worldmap of the spatial distribution of extracted flood events in the Groundsource dataset. The map displays the total number of flood events extracted by the LLM-based pipeline aggregated per grid cell. The data are visualized using a Robinson projection, with event counts represented by a logarithmic color scale. Red points indicate the spatial centroids of reference flood events from the GDACS database.

Excited to announce Groundsource - an open-source dataset of historic flood events! This has easily been one of the coolest projects I've worked on recently!

Thread 🧵 for details and all relevant links. 1/n

1 month ago 73 39 2 4

Quick Start — NeuralHydrology 1.13.0 documentation

Also, we finally stopped using conda and now use uv to install environments. If you, too, have stared at the "Solving environment" message for hours, give the new installation method a try! More details in neuralhydrology.readthedocs.io/en/latest/us...

3 months ago 2 0 0 0

Release v1.13.0 · neuralhydrology/neuralhydrology Setup changes As of #279, NeuralHydrology switched from using conda environments to uv. This has several advantages (e.g., it's much faster to create environments, and we'll be able to get up-to-da...

New #NeuralHydrology release 🎉
Some news from v1.13.0:
* CAMELS-IND & CAMELS-DE support
* AORC hourly forcing support
* xLSTM supportSupport for embedding layers in MTS-LSTMs

...and various other improvements and fixes. The full release notes: github.com/neuralhydrol...

Thanks to all contributors!

3 months ago 5 0 1 1

It's an interdisciplinary session so this can be your 2nd abstract.

Submission deadline: Jan 15

Thanks to @danklotz.bsky.social for the fantastic BUGS logo!

4 months ago 1 0 0 0

Back by popular demand: At #EGU26 we'll organize another BUGS session: Blunders, Unexpected Glitches, and Surprises!

Submit abstracts on ideas that seemed great but didn't work, errors and bugs that led to new insights (or funny stories), or any other unexpected results.

www.egu26.eu/session/56997

4 months ago 14 8 1 1

Screenshot from Google Scholar

On that note, props to HESS for getting the bibtex citation right on the first try ("w\_\_\_")!
hess.copernicus.org/articles/29/...

Google Scholar drops two of the _ but at least escapes the remaining one correctly...

5 months ago 1 0 0 0

It's (finally) published: hess.copernicus.org/articles/29/...

Looking forward to all the different ways the title will be messed up by indexing tools!

5 months ago 3 0 1 0

Congratulations to Frederik Kratzert on winning this year's Arne Richter Award for outstanding research by an early career scientist. Fantastic presentation at #EGU25 this afternoon!

11 months ago 29 5 1 0

I can't compete with @kratzert.bsky.social's swag game, but I'll contribute a few old NeuralHydrology stickers that I found recently :)

1 year ago 1 0 0 0

Now on HESSD for open discussion: egusphere.copernicus.org/preprints/20...

They even let us keep the paper title (for now?!) 🙄

1 year ago 10 1 1 1

NeuralHydrology just got a little better, especially if you're building custom models :)

1 year ago 4 0 0 0

GitHub - neuralhydrology/neuralhydrology: Python library to train neural networks with a strong focus on hydrological applications. Python library to train neural networks with a strong focus on hydrological applications. - neuralhydrology/neuralhydrology

Paper: eartharxiv.org/repository/v...

Code to reproduce is part of NeuralHydrology 1.12.0 which we just released: github.com/neuralhydrol...

Code to analyze results: github.com/gauchm/missi...

1 year ago 1 0 0 0

Median NSE and KGE across 531 basins at different amounts of missing input time steps. The dotted horizontal line provides the baseline of a model that cannot deal with missing data but is trained to ingest all three forcing groups at every time step. The dashed line represents the baseline of a model that uses the worst individual set of forcings (NLDAS). The shaded areas indicate the spread between minimum and maximum values across three seeds; the solid lines represent the median.

All of the approaches work pretty well! Masked mean tends to perform a little better, but it's often quite close.

More details, experiments, figures, etc. in the paper.

All of this is joint work with @kratzert.bsky.social, @danklotz.bsky.social, Grey Nearing, Debby Cohen, and Oren Gilon.

1 year ago 1 0 1 0

Illustration of the attention embedding strategy. Each forcing provider is projected to the same size through its own embedding network. The resulting embedding vectors become the keys and values. The static attributes, together with a binary flag for each provider, serve as the query. The attention-weighted average of embeddings is passed on to the LSTM.

3) Attention: A more general variant of the masked mean that uses an attention mechanism to dynamically weight the embeddings in the average based on additional information, e.g., the basins' static attributes.

1 year ago 0 0 1 0

Illustration of the masked mean strategy. Each forcing provider is projected to the same size through its own embedding network. The resulting embeddings of valid providers are averaged and passed on to the LSTM.

2) Masked mean: Embed each group of inputs separately (a group being the inputs from one data provider) and average the embeddings that are available at a given time step. This is what we currently do in Google's operational flood forecasting model.

1 year ago 0 0 1 0

Illustration of the input replacing strategy. NaNs in the input data for a given time step are replaced by zeros, all forcings are concatenated, together with one binary flag for each forcing group which indicates whether that group was NaN or not. The resulting vector is passed through an embedding network to the LSTM.

In the paper we present and compare three ways to deal with those situations:
1) Input replacing: Just replace NaNs with some fixed value and concatenate the inputs with a flag to indicate missing data.

1 year ago 0 0 1 0

Different scenarios for missing input data: outages at individual time steps (top), data products starting at different points in time (middle), and local data products that are not available for all basins (bottom). All of these scenarios reduce the number of training samples for models that cannot cope with missing data (yellow, small box), while the models presented in this paper can be trained on all samples with valid targets (purple, large box).

Starting on bsky with a new preprint: "How to deal w___ missing input data"
doi.org/10.31223/X50...

Missing input data is a very common challenge in deep learning for hydrology: weather providers have outages, some data products start later than others, some only exist for certain regions, etc.

1 year ago 19 5 1 4

Posts by Martin Gauch