Advertisement · 728 × 90

Posts by Andrew Lamb

Preview
DataFusion and the Rise of Deconstructed Data Systems A deep dive into DataFusion and its role in enabling an explosion of new open source, composable data systems.

This blog does a great job explaining and exampling (?) the benefits of using DataFusion to build systems: thedataquarry.com/blog/datafus...

3 days ago 9 1 0 0

I am going to give a talk about why tokio.rs is a great choice for running CPU bound tasks in Database Query Egnines

1 week ago 20 2 2 0
Preview
Portland Apache DataFusion Meetup · Luma Join us for an evening of talks, community discussions about Apache DataFusion and its growing role in modern data infrastructure. This meetup will spotlight…

If you are coming to tokio.conf or are in the Portland area on April 22 we are having holding a DataFusion meetup at OHSU -- thanks to Mustafa Akur for helping arrange the details.

Signup here luma.com/dsp3ud82

1 week ago 2 1 0 0
Post image

DataFusion 53.0.0 is released: datafusion.apache.org/blog/output/...

2 weeks ago 12 1 0 0
Preview
Portland Apache DataFusion Meetup · Luma Join us for an evening of talks, community discussions about Apache DataFusion and its growing role in modern data infrastructure. This meetup will spotlight…

We are holding an @ApacheDataFusio meetup on Wed April 22 in Portland: luma.com/dsp3ud82

Yes, this is a day before the meetup in Seattle/Bellevue on April 23 luma.com/hxshbp0m

2 weeks ago 2 1 0 0
Post image

In depth blog on how to write table providers for Apache DataFusion : datafusion.apache.org/blog/2026/03...

2 weeks ago 12 1 0 0
Post image

Another example of incorporating the latest Database Research (in this case Limit Pruning from ❄️) in Apache DataFusion
datafusion.apache.org/blog/2026/03....

4 weeks ago 9 0 0 0
Preview
GitHub - datafusion-contrib/datafusion-skills Contribute to datafusion-contrib/datafusion-skills development by creating an account on GitHub.

DataFusion getting in the agent skills game too: github.com/datafusion-c...

4 weeks ago 9 1 0 0

The video is here www.youtube.com/watch?v=pqoR... and is quite a good summary of how to build a modern database backed by Object Storage. This one is worth watching.

1 month ago 14 0 0 0
Advertisement
Apache DataFusion Meetup Stockholm, March 2026
Apache DataFusion Meetup Stockholm, March 2026 YouTube video by Andrew Lamb

Recording and slides available from the DataFusion Stockholm Meetup:
Recording: youtu.be/9u4cNmL14Xs

Slides: github.com/apache/dataf...

1 month ago 4 2 0 0
Preview
New York City Apache DataFusion Meetup · Luma Welcome to the New York City Apache DataFusion Meetup! Join us for an evening of talks, panel discussion, and community discussion of Apache DataFusion and…

Join us for another NYC DataFusion Meetup: luma.com/adhshv92 May 12, 2026

As always, we would love any/all speakers anything DataFusion related (what you have built using DataFusion, projects you are working on, etc)

1 month ago 2 1 0 0
VLDB 2026 | SponsorshipThe VLDB 2026 conference, will take place in Boston, MA, United States, from Aug 31st to Sep 4th, 2026, and will feature research talks, tutorials, demonstrations, and workshops...

In 2026, VLDB is returning to the Boston area 5 decades after it was born here (first VLDB was in Framingham). A good opportunity to get your company's name on the program (and earn the everlasting gratitiude of the organizing committee) vldb.org/2026/sponsor...

1 month ago 3 0 0 0

Yeah, shredding is a very clever optimization

1 month ago 1 0 0 0
Post image

Here is a new blog about Parquet Variant, including use case, and shredding examples

parquet.apache.org/blog/2026/02...

1 month ago 7 2 1 1

It came up on the Parquet sync today if anyone has practical experience with comparing FastLanes encoding vs "classic" bit packing (without the unified shuffled layouts). If you have would love to know your experience

1 month ago 2 0 0 0

I suggest getting comfortable with rm -rf every few days -- it works wonders for me :)

1 month ago 3 0 0 0
Preview
parquet-linter: A better Parquet is Parquet itself – Xiangpeng’s blog Unleash the performance potential of your Parquet files

Simply applying basic linting rules (like don't compress pages where it doesn't help) reduces parquet files sizes by 5% and decreases decode time by 20%.

@xiangpeng.systems shows how in his latest blog
blog.xiangpeng.systems/posts/parque...

1 month ago 19 2 0 0
Native Geospatial Types in Apache Parquet Native Geospatial Types in Apache Parquet

Great inaugural post about the geospatial types on the Parquet blog.

Thank you Jia Yu, Dewey Dunnington , Kristin Cowalcijk, Feng Zhang.

More posts coming !

parquet.apache.org/blog/2026/02...

2 months ago 8 2 0 0
Preview
Stockholm Apache DataFusion Meetup · Luma Join us for an evening of talks, panel discussions, and community discussions about Apache DataFusion and its growing role in modern data infrastructure. This…

Just a few more weeks until the Stockholm DataFusion meetup: luma.com/ctqtiqap

2 months ago 5 0 0 0
Advertisement
Post image

📖 Apache Parquet recently added native support for Geospatial. This post explains what that means and why it is important: parquet.apache.org/blog/2026/02...

2 months ago 13 2 0 0
Preview
Building A Distributed SQL Database in 30 Days with AI My journey building HoloStore a distributed key/value store and HoloFusion a distributed SQL DB using AI using the Accord consensus protocol from Cassandra.

kellabyte.substack.com/p/building-a... -- Exactly the kind of thing that shows the power of DataFusion. You can build the database and not (re) build the core query engine

2 months ago 9 2 0 2
Post image Post image

You can use ApacheParquet for Vector Search with embedded indexes:

> We don’t change the file format; we just tune it.

Xiangpeng Hao explains how in blog.xiangpeng.systems/posts/vector...

2 months ago 6 1 0 0
Preview
The Quest for One Million IOPS: Benchmarking Storage at LanceDB Learn how LanceDB benchmarks storage and how we achieved one million disk reads per second.

Different techniques are needed to max out modern NVMe SSDs.

@westonpace.bsky.social LanceDB blog is so good if you want the industrial version: lancedb.com/blog/one-mil...

Viktor Leis's LeanStore paper is great if you want the academic version: vldb.org/pvldb/vol16/...

2 months ago 13 1 0 0
Post image

A somewhat academic talk about the AI usecases driving changes in Apache Parquet and new formats in "Column Storage for the AI Era"

Recording: youtu.be/k9uhw7yqPsQ
Slides: docs.google.com/presentation...

2 months ago 8 0 0 0

What I really need is to focus more on reviews / getting stuff merged as now the coding is even easier 😅

2 months ago 2 0 0 0
Post image

Optimized implementation of SQL CASE expressions in column stores requires careful engineering. The latest Apache DataFusion blog from Pepijn Van Eeckhoudt and Raz Luvaton explains how it works

datafusion.apache.org/blog/2026/02...

2 months ago 6 0 0 0

One downside of tools like Codex is that it enables even more "side quests" -- I was already pretty bad at focusing, and now the ability to write the equivalent of a ticket and have some code to review in 10 minutes makes the problem far worse.

2 months ago 7 0 2 0
Post image

DataFusion 52 Release Blog is Published datafusion.apache.org/blog/2026/01...

2 months ago 5 0 0 0
Advertisement
Post image

I love it when I see a whole pile of commits I didn't review go to DataFusion main
github.com/apache/dataf...

2 months ago 10 0 0 0
Preview
Why Rust Will Help You Deliver Better Low-latency Systems and Happier Developers Andrew Lamb, a veteran of database engine development, shares his thoughts on why Rust is the right tool for developing low-latency systems, not only from the perspective of the code’s performance, bu...

1/5 ➡️ Why Rust Will Help You Deliver Better Low-latency Systems and Happier Developers with Andrew Lamb
bit.ly/47kgmwU

@andrewlamb1111.bsky.social

#RustLang #LowLatency

2 months ago 5 2 1 0