This blog does a great job explaining and exampling (?) the benefits of using DataFusion to build systems: thedataquarry.com/blog/datafus...
Posts by Andrew Lamb
I am going to give a talk about why tokio.rs is a great choice for running CPU bound tasks in Database Query Egnines
If you are coming to tokio.conf or are in the Portland area on April 22 we are having holding a DataFusion meetup at OHSU -- thanks to Mustafa Akur for helping arrange the details.
Signup here luma.com/dsp3ud82
DataFusion 53.0.0 is released: datafusion.apache.org/blog/output/...
We are holding an @ApacheDataFusio meetup on Wed April 22 in Portland: luma.com/dsp3ud82
Yes, this is a day before the meetup in Seattle/Bellevue on April 23 luma.com/hxshbp0m
In depth blog on how to write table providers for Apache DataFusion : datafusion.apache.org/blog/2026/03...
Another example of incorporating the latest Database Research (in this case Limit Pruning from ❄️) in Apache DataFusion
datafusion.apache.org/blog/2026/03....
The video is here www.youtube.com/watch?v=pqoR... and is quite a good summary of how to build a modern database backed by Object Storage. This one is worth watching.
Recording and slides available from the DataFusion Stockholm Meetup:
Recording: youtu.be/9u4cNmL14Xs
Slides: github.com/apache/dataf...
Join us for another NYC DataFusion Meetup: luma.com/adhshv92 May 12, 2026
As always, we would love any/all speakers anything DataFusion related (what you have built using DataFusion, projects you are working on, etc)
In 2026, VLDB is returning to the Boston area 5 decades after it was born here (first VLDB was in Framingham). A good opportunity to get your company's name on the program (and earn the everlasting gratitiude of the organizing committee) vldb.org/2026/sponsor...
Yeah, shredding is a very clever optimization
Here is a new blog about Parquet Variant, including use case, and shredding examples
parquet.apache.org/blog/2026/02...
It came up on the Parquet sync today if anyone has practical experience with comparing FastLanes encoding vs "classic" bit packing (without the unified shuffled layouts). If you have would love to know your experience
I suggest getting comfortable with rm -rf every few days -- it works wonders for me :)
Simply applying basic linting rules (like don't compress pages where it doesn't help) reduces parquet files sizes by 5% and decreases decode time by 20%.
@xiangpeng.systems shows how in his latest blog
blog.xiangpeng.systems/posts/parque...
Great inaugural post about the geospatial types on the Parquet blog.
Thank you Jia Yu, Dewey Dunnington , Kristin Cowalcijk, Feng Zhang.
More posts coming !
parquet.apache.org/blog/2026/02...
📖 Apache Parquet recently added native support for Geospatial. This post explains what that means and why it is important: parquet.apache.org/blog/2026/02...
kellabyte.substack.com/p/building-a... -- Exactly the kind of thing that shows the power of DataFusion. You can build the database and not (re) build the core query engine
You can use ApacheParquet for Vector Search with embedded indexes:
> We don’t change the file format; we just tune it.
Xiangpeng Hao explains how in blog.xiangpeng.systems/posts/vector...
Different techniques are needed to max out modern NVMe SSDs.
@westonpace.bsky.social LanceDB blog is so good if you want the industrial version: lancedb.com/blog/one-mil...
Viktor Leis's LeanStore paper is great if you want the academic version: vldb.org/pvldb/vol16/...
A somewhat academic talk about the AI usecases driving changes in Apache Parquet and new formats in "Column Storage for the AI Era"
Recording: youtu.be/k9uhw7yqPsQ
Slides: docs.google.com/presentation...
What I really need is to focus more on reviews / getting stuff merged as now the coding is even easier 😅
Optimized implementation of SQL CASE expressions in column stores requires careful engineering. The latest Apache DataFusion blog from Pepijn Van Eeckhoudt and Raz Luvaton explains how it works
datafusion.apache.org/blog/2026/02...
One downside of tools like Codex is that it enables even more "side quests" -- I was already pretty bad at focusing, and now the ability to write the equivalent of a ticket and have some code to review in 10 minutes makes the problem far worse.
DataFusion 52 Release Blog is Published datafusion.apache.org/blog/2026/01...
I love it when I see a whole pile of commits I didn't review go to DataFusion main
github.com/apache/dataf...
1/5 ➡️ Why Rust Will Help You Deliver Better Low-latency Systems and Happier Developers with Andrew Lamb
bit.ly/47kgmwU
@andrewlamb1111.bsky.social
#RustLang #LowLatency