As someone once said - "Metadata is a love note to the future." We believe in building systems that stay true to this, metadata is better than more data!
Come try it out! Apache-2.0, no strings attached
Docs: terrafloww.github.io/rasteret/
GitHub: github.com/terrafloww/r...
Posts by Terrafloww
We replicated Major-TOM dataset from source COGs instead of image-inside-Parquet - 6x faster reads than HF datasets.
Our bet -
Your dataset is a table. Pixels stay where they already live. Everything else - splits, labels, patch geometries - lives as columns you can version, share, and reproduce.
2. When you need pixels, pick your output and our engine gets it for you:
- get_numpy() → [N, C, H, W] arrays
- get_xarray() → xarray Dataset
- to_torchgeo_dataset() → drop-in GeoDataset
No GDAL, no TIFF metadata re-parsing, no cold-start tax. Upto 20x faster hashtag#TorchGeo data loading.
The flow -
1. Build a 'Collection' from 12 built-in datasets
(Sentinel, Google DeepMind Alpha Earth Embeddings, and more), or Bring-Your-Own from any STAC API, or Parquets with COG URLs.
Filter, join, add splits, labels, quality flags as columns, with PyArrow, Polars, DuckDB without moving images.
Releasing Rasteret 0.3.x 🚀
EO image datasets should be in tables, not folders. Something for your weekend coding bug!
We prove this in our new blog explaining how we use Apache Arrow, Parquet and our custom IO engine to redefine how to interact with EO imagery!
blog.terrafloww.com/eo-datasets-...
This combo matters not only for better devX today, but also for agentic commerce, agent shouldn't click “Contact Sales”
It needs contract + trust + billing.
Write-up: blog.terrafloww.com/streaming-te...
If you’ve built EO pipelines, We’d love feedback.
#EarthObservation #GeoAI #Geospatial
Part/3
What we’re launching:
1. Rasteret SDK : provides stream of bytes from S3 → Arrow/DLPack → JAX/PyTorch tensors
2. Terrafloww Platform : handles data discovery + attribution/licensing + metering/pay-per-byte, built on open standards and does not copy/transform images
Part/2
What’s broken today:
- STAC discovery + filtering overhead
- downloading GeoTIFFs to disk
- Rasterio/GDAL loops that starve GPUs
- unclear attribution + licensing for reuse of data
Most GeoAI teams don’t lose time on model training.
They lose it before the GPU ever sees a tensor.
YouTube didn’t win by inventing better cameras.
It won by standardizing the player.
GeoAI needs the same shift: stop moving files, stream tensors and monetize it. But this requires a lot of work 🧵
Been working on 'Rasteret' since the last blog I wrote, its out now as an early release.
More details - blog.terrafloww.com/rasteret-a-l...
Open to feedback and contributions, there is much more exciting work to do!
github.com/terrafloww/r...
#geospatial #cloudnativegeo #opensource