Advertisement · 728 × 90

Posts by Andy Grove

Post image

Helpful advice. Thanks, Claude.

2 months ago 16 0 0 0
Preview
Databases in 2025: A Year in Review The world tried to kill Andy off but he had to stay alive to to talk about what happened with databases in 2025.

I've posted my latest recap of the world of databases: www.cs.cmu.edu/~pavlo/blog/...

All the hot topics from the last year:
• More Postgres action!
• MCP for everyone!
• MongoDB gets litigious with FerretDB!
• File formats!
• Market movements!
• The richest person in the history of the world!

3 months ago 77 26 1 6
Preview
feat: Add microbenchmark for string functions by andygrove · Pull Request #26 · apache/datafusion-benchmarks This PR adds microbenchmarks for scanning a Parquet file and evaluating a single string expression per row. The benchmark runs against DuckDB and DataFusion and compares the results. Assuming that ...

Is there anyone in my network with DuckDB skills who could review a PR that runs a Python script to compare the performance of DataFusion and DuckDB for some simple SQL queries?

github.com/apache/dataf...

3 months ago 3 1 0 0
Future of Iceberg Support in Comet · Issue #2921 · apache/datafusion-comet What is the problem the feature request solves? Comet currently has two different approaches to scanning Iceberg tables. One approach is based on integrating with the Iceberg Java library, and the ...

There is a new Comet issue to discuss the future of Iceberg support and whether we should focus on using the iceberg-rust or Java implementation of Iceberg. Please add your thoughts if this is something that you care about!

github.com/apache/dataf...

3 months ago 1 0 0 0
Apache DataFusion Comet 0.11.0 Release - Apache DataFusion Blog

On behalf of the DataFusion PMC, I'm excited to announce the release of version 0.11.0 of the Comet accelerator for Apache Spark!

datafusion.apache.org/blog/2025/10...

5 months ago 6 0 0 0
Post image

It’s steak night tonight and our dog is patiently waiting for her share.

6 months ago 11 1 0 0

I like the name “RAD stack” for this.

6 months ago 8 0 0 0
Advertisement
Apache DataFusion Comet 0.10.0 Release - Apache DataFusion Blog

Check out the latest release of the Comet accelerator for Apache Spark

datafusion.apache.org/blog/2025/09...

6 months ago 10 1 0 0
Video

Introducing Iron Vector: native, columnar, vectorized, high-performance accelerator for Apache Flink SQL and Table API built on top of Rust, Arrow and DataFusion.

Reduce your Flink compute cost by up to 2x or handle 2x more data with the same infrastructure.

6 months ago 17 5 1 1
Preview
crates.io phishing campaign | Rust Blog Empowering everyone to build reliable and efficient software.

We received reports of a phishing campaign targeting crates​.io users. Do not click on links asking to authenticate to protect your account. More information: blog.rust-lang.org/2025/09/12/c...

6 months ago 112 57 0 1
Post image

Thanks to @clflushopt.bsky.social, make massive TPCH datasets with tpchgen-cli 2.0:

SF1000 (1TB raw, 220GB in @ApacheParquet ) in less than 10 mins (6m45s) on aging laptop

Try it now:

pip install tpchgen-cli
tpchgen-cli --scale-factor 1000 --parts 100 --format=parquet

github.com/clflushopt/t...

7 months ago 4 1 0 0
Post image

I've been helping our analytics team integrate our DataFusion-based query engine for Postgres into EDB Postgres Distributed and finally here's an end-to-end demo.

You get HA Postgres plus seamless replication and DataFusion-based queries. This query turned out 6x faster than PG.

7 months ago 14 4 1 0
Post image

How my day is going

7 months ago 7 0 0 0
Comet Roadmap — Apache DataFusion Comet documentation

We now have a roadmap section in the Comet contributor guide, in case anyone was wondering what we are focusing on lately and what features will be arriving in future releases.

datafusion.apache.org/comet/contri...

7 months ago 6 0 0 0
Advertisement
Preview
Software Engineer, ASE Cassandra Storage - Jobs - Careers at Apple Apply for a Software Engineer, ASE Cassandra Storage job at Apple. Read about the role and find out if it’s right for you.

Cassandra Team at Apple is searching for a fresh grad / person early in their career to join our ranks in SF/Bay Area!

Come work on super interesting problems with world class team. Help us build better Cassandra!

Ping me if you’re interested!

jobs.apple.com/en-us/detail...

8 months ago 16 10 0 0
Preview
perf: Add performance tracing capability by andygrove · Pull Request #1706 · apache/datafusion-comet Which issue does this PR close? Closes #1705 Rationale for this change This feature makes it possible to visualize the flow of calls during query execution. What changes are included in this PR?...

It took me a really long time to understand the flow of execution between JVM and native code during query execution in Comet. I wish I had thought about adding a tracing capability earlier.

github.com/apache/dataf...

11 months ago 6 1 0 0
Apache DataFusion Python 46.0.0 Released - Apache DataFusion Blog

We're pleased to announce that Apache DataFusion in Python 46.0.0 is released! Since the last announcement post we've had a lot of great features and new contributors. Please check out the blog post with details.

datafusion.apache.org/blog/2025/03...

#DataFusion #Python #DataFrame #PyData #Apache

1 year ago 6 2 0 0
Preview
Senior Software Development Engineer (Apache Spark) - Apple Data Platform - Jobs - Careers at Apple Apply for a Senior Software Development Engineer (Apache Spark) - Apple Data Platform job at Apple. Read about the role and find out if it’s right for you.

We have a position open in the Spark team at Apple, in our Cupertino, CA office. The role would include working on Apache DataFusion Comet.

jobs.apple.com/en-us/detail...

1 year ago 14 6 0 0
Apache DataFusion Comet: Benchmarks Derived From TPC-H — Apache DataFusion Comet documentation

We have TPC-H benchmarks for single node with a small scale factor in the contributors guide. We only benchmark against Spark though and not against Spark RAPIDS.

datafusion.apache.org/comet/contri...

1 year ago 1 0 0 0

Here's the blog post announcing Comet 0.7.0

datafusion.apache.org/blog/2025/03...

1 year ago 7 1 0 0

I hate to say it, but "it depends". I'd recommend running your own benchmarks for your specific workloads. Performance will also vary greatly by environment (number of CPUs vs GPUs, different GPU types, and so on).

1 year ago 0 0 2 0
Preview
GitHub - apache/datafusion-comet: Apache DataFusion Comet Spark Accelerator Apache DataFusion Comet Spark Accelerator. Contribute to apache/datafusion-comet development by creating an account on GitHub.

DataFusion Comet 0.7.0 is now available in Maven. We'll be publishing a blog post next week with all the details.

The repo has been updated with the latest benchmark results. For single executor TPC-H @ 100 GB, we now see a 2.2x increase over Spark (up from 2x in 0.6.0).

github.com/apache/dataf...

1 year ago 12 1 1 1

One month on, and I have zero regrets about quitting Facebook & Instagram.

I have replaced the scrolling time with listening to podcasts.

I now stay in touch with family overseas via email and photo sharing, and I use Snapchat for sharing photos with immediate family, privately. Works great.

1 year ago 15 0 1 0
Preview
Comparing Apache, CNCF, and Commonhaus | cnr.sh I've used open source projects for over 30 years and contributed for about 20 of those. My first interaction with an open source foundation was with Apache when I began working with Apache Hadoop ...

Chris Riccomini (@chris.blue) shares his thoughts on Open Source foundations: Apache, CNCF, Commonhaus. He also explains why Commonhaus is a better fit for SlateDB

cnr.sh/posts/compar...

1 year ago 14 6 0 0
Advertisement
Apache DataFusion Comet 0.6.0 Release - Apache DataFusion Blog

Comet 0.6.0 has been released. This is a smaller release than usual now that we have moved to an approximately monthly release cadence to match core DataFusion.

datafusion.apache.org/blog/2025/02...

1 year ago 6 0 0 0
Apache DataFusion Ballista 43.0.0 Released - Apache DataFusion Blog

Ballista 43.0.0 has been released, and now provides seamless integration with DataFusion.

datafusion.apache.org/blog/2025/02...

1 year ago 16 0 0 0
Apache DataFusion Community Meeting 2025/01/22 08:57 MST - Recording
Apache DataFusion Community Meeting 2025/01/22 08:57 MST - Recording YouTube video by Datadog

Check out this excellent presentation from @robtandy.bsky.social on his work with the DataFusion Ray project from last week's DataFusion community meetup.

It is a great overview of how to build a distributed system on top of DataFusion.

www.youtube.com/watch?v=ceTo...

1 year ago 11 2 1 2
Preview
This Week in Comet (Jan 26) · Issue #1342 · apache/datafusion-comet Introduction These notes reflect things I am personally involved in or thinking about and may not cover all activities. Feel free to add comments for anything that I missed. Previous week's issue: ...

This Week in DataFusion Comet (Jan 26):

github.com/apache/dataf...

1 year ago 6 0 0 0
Communication — Apache DataFusion documentation

Is this using Arrow and/or DataFusion? If so, our Discord is probably a good place to ask.

datafusion.apache.org/contributor-...

1 year ago 0 0 1 0

I've finally decided to quit using Facebook. My feed is overwhelmed with nonsense content that I am not interested in and cannot seem to block.

It is a real shame, though, because it was a good way to stay connected with family.

Is there a viable alternative? What are others using instead?

1 year ago 6 0 4 1