📝 "Hardwood Reaches Beta: S3, Predicate Push-Down, CLI, and More"
Happy to announce Hardwood 1.0.0.Beta1, adding S3 support, predicate push-down for both local and remote files, Avro bindings, a CLI for inspecting Parquet files, a new website, and much more.
👉 www.morling.dev/blog/hardwoo...
Posts by Gunnar Morling
And just like that, it's time for March's Interesting Links in the Data and AI World!
👉🏻 rmoff.net/2026/03/25/i...
With links from great folk including @gunnarmorling.dev, @tomcooper.dev, @sqlliz.bsky.social, @tanelpoder.com, @eatonphil.bsky.social, @larahogan.bsky.social, @almog.xyz, and more!
🥳 Just merged support to #Hardwood for reading Parquet files directly from S3! Including projections and predicate push-down (row group level), to fetch only the actually required column chunks, thus limiting the amount of data downloaded from the object store. Should be ready to ship next week!
Inspired by @gunnarmorling.dev, I've added a page to my blog about how I do, and don't, use AI on my blog. tl;dr: NEVER for writing content. gross. But, for plenty else! 👉🏻 rmoff.net/ai/
If you want to learn more about the Variant type in Parquet, Aihua Xu and @andrewlamb1111.bsky.social wrote a great blog post on the project blog.
parquet.apache.org/blog/2026/02...
🍊 Woot, woot, #Hardwood made it to the HN front page! Be gentle, folks :)
✍🏻 Interesting Links in the Data & AI World, February 2026
Links from @gunnarmorling.dev, @nbuesing.bsky.social, @ssp.sh, @sap1ens.com, @neelesh.bsky.social, @rstephens.me, @justincormack.bsky.social, @sogrady.org, @simonwillison.net, @cassidoo.co, and more!
🔗 rmoff.net/2026/02/27/i...
"Are we using AI for building Hardwood? Absolutely. Is Hardwood vibe-coded? Absolutely not."
Coding agents are incredibly powerful, but they're no magic wand. Sharing some experiences with using AI for building a relatively low-level code base like a Parquet parser in the #Hardwood announcement 👇.
Interesting usage of the lessons learned during the #1BRC contest !
📝 "Hardwood: A New Parser for Apache Parquet"
Today is the day--beyond excited to share the first release of #Hardwood, a new parser for the Apache #Parquet file format, optimized for minimal dependencies and great performance.
👉 www.morling.dev/blog/hardwoo...
Woot, woot, tackled my personal final boss, the Maven release plug-in, and we now have a fully automated release pipeline for hashtag#Hardwood. Gonna do some more testing, but we should be really close now to a first tagged release on Maven Central 🥳. Thx to @andresalmiray.com for all the help!
Most folks think of #Debezium as source connectors only, but it also provides a JDBC sink for end-to-end data pipelines. Great post by the team at Zepto on their contributions, including an upsert reduction buffer to cut load on the sink database. OSS FTW 🚀!
👉 blog.zeptonow.com/debezium-at-...
Oh, wow, thank you, Holly!
@gunnarmorling.dev has written a great post about conference talks (I wish I'd written it myself): www.morling.dev/blog/ten-tip...
Today is a great day to verify you don't get yourself locked out of any 2FA-secured accounts when losing your phone. Print out back-up codes, set up fallback keys, etc. It takes just a few moments and can save you massive headaches later on.
It never stops being funny how Claude Code is patting its own should for the amazing job it's doing 🤣.
Discover how #DataFrames are transforming #DataOrientedProgramming in #Java!
Analyze the #1BRC challenge & see how Java frameworks can outperform Python in memory management without sacrificing readability.
Discover how DataFrames elevate your Java programming experience ⇨ bit.ly/4avEDzZ
Spent some time to make the #Hardwood parser for Parquet files a little faster; The engine parallelizes better now, a full parse of 720 MB of the NYC taxi ride data set--summing up the values from three columns--takes about 2.3 sec on my laptop.
👉 github.com/hardwood-hq/...
Gotcha. Gotta check my calendar, but if I'm free I'll join.
Very cool! Is there an agenda yet?
We’re doing an Agentic Coding conference in Hamburg on March 22nd! 🎉
Not many details yet, but super early bird tickets are on sale and selling fast.
Super excited about this one! #hamburg
luma.com/45lfeyeh
It starts in 10mn!
www.youtube.com/watch?v=lVOR...
🤓 Just realized I haven't shared my "Streaming Examples" repo here before. It contains a growing list of runnable examples around data streaming (e.g. with Apache Kafka), stream processing (Apache Flink), change data capture (Debezium), etc.
👉 github.com/gunnarmorlin...
Made good progress with Hardwood, a minimal dependency parser for Apache Parquet. Reworked the API, moved to new GitHub org, almost all files from the parquet-testing harness pass now 🎉. Projections and predicate push-down are up next.
👉 github.com/hardwood-hq/...
Oh, that's a useful feature. You can enforce causal consistency in general (which includes read-your-writes and other guarantees) with that mechanism. We've had that in MongoDB for some time, though, and customers have found it hard to use: emptysqua.re/blog/how-to-...
Happy Holidays, and to those who celebrate: Merry Christmas 🎄🎁☃️! Enjoy the days ahead ("between the years", as we call it in German), and of course: yippee ki-yay!
🐘 Oh, that's nice: Postgres 19 is going to ship read-your-writes for standbys when using asynchronous replication, via a new command WAIT FOR LSN (similar to WAIT_FOR_EXECUTED_GTID_SET in MySQL). Very cool!
www.postgresql.org/docs/devel/w...
In the past few years, we’ve seen a cambrian explosion of new columnar formats, challenging the hegemony of Parquet. Presumably, the design of yore is not going to cut it moving forward. I spent some time to understand a bit better how things actually changed.
sympathetic.ink/2025/12/11/C...
A screenshot featuring responses from Aaron Francis, antirez, Charity Majors, and Eric Lippert
Why write engineering blogs? Here’s how some of our favorite bloggers responded to the question “Why did you start blogging and why do you continue?” writethatblog.substack.com/p/why-write-...
📝 Blogged: "You Gotta Push If You Wanna Pull"
Wrote up some thoughts on push vs. pull queries, materialized views, and why data duplication isn't something to fear.
👉 www.morling.dev/blog/you-got...