Today we're announcing the availability of logical replication from Postgres to Iceberg with Crunchy Data Warehouse.
Now you can seamlessly move data and stream changes from your operational database into an analytical system.
www.crunchydata.com/blog/logical...
Posts by Crunchy Data
Query the cache hit with:
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM
pg_statio_user_tables;
For Bridge customers, cache hit data is in your clusters insights.
How long has it been since you checked your cache hit ratio?
Ideally most of your frequently queried data is in the buffer cache. We recommend 98-99% in the cache for transactional workloads - analytical workloads are lower.
Postgres + Iceberg side-by-side next to the same query with a Postgres HEAP table.
Half a millisecond for the optimized 🔥 . 10 seconds for the long method.
www.crunchydata.com/products/war...
Parameter | Value
---------------------------------+----------------------------------------------
TimeZone | America/Chicago
application_name | psql
client_encoding | UTF8
lock_timeout | 10s
psql tip: find your config parameters that are not default with \dconfig.
List of non-default configuration parameters:
Postgres does a good job of keeping internal statistics of your which are used to plan how queries are executed. But Postgres doesn't always know how columns are related. In this deep dive we look at hacking the statistics for improved performance
Miss our webinar on running Crunchy Data Warehouse on-premises? We cover an overview, hands on walk through with live querying our data lake to full Iceberg creation, and highlight of popular use cases for it. If you missed it don't worry we got you covered.
www.youtube.com/watch?v=Vojg...
The command to reset pg_stat_statements is:
SELECT pg_stat_statements_reset();
4) After major maintenance operations
Resetting pg_stat_statements after major operations (pg_repack, reindexes, altering table structures, etc) helps to measure how any changes affect performance.
3) When benchmarking query performance
If you are running benchmarks and tests to evaluate query improvements, resetting pg_stat_statements makes sure that your results reflect only the queries executed during the benchmark periods.
It will also help accuracy of reports (for example, the outliers insight) by preventing old, outdated queries from skewing the results.
2) After deploying significant query changes
If you have made significant changes to an app, optimized queries, or modified indexes, resetting pg_stat_statements helps to measure the impact of those changes more accurately.
1) At the start of a new monitoring period
If you analyze performance trends daily, weekly, or monthly, resetting pg_stat_statements at the beginning of such a period helps ensure that each period starts with a clean dataset.
We all love pg_stat_statements but that data collects forever and can get a little stale. When should you reset it?
Here are some tips from our support team.
Excited to announce Crunchy Data Warehouse is now available for Kubernetes and On-premises. Need faster analytics from Postgres? Want a native Postgres data lake experience? Learn more about how it works: www.crunchydata.com/blog/crunchy...
Need to get data to or from Postgres and into Parquet? pg_parquet is an open source extension that makes this easy, no need for complicated ETL processes -
We talk with users every week about whether Citus is a good fit for them. While Citus is a very powerful Postgres extension it has very specific use cases where it does fit, here we break down the cases where Citus is a fit as well as when it's not -
When it comes to developer tools everyone has their favorite tips and tricks. And because we love Postgres we worked to capture many of our favorite tips and tricks for Postgres here in this collection -
Excited to announce built-in maintenance for Iceberg via Postgres.
Now within Crunchy Data Warehouse we will automatically vacuum and continuously optimize your Iceberg data by compacting and cleaning up files.
Dig into the details of how this works www.crunchydata.com/blog/automat...
Citus is a Postgres extension that turns it into a sharded, distributed, horizontally scalable database. With all these buzzwords, it attracts a lot of people thinking it can solve all their problems. We dig into when it is a good fit and when it isn't - www.crunchydata.com/blog/citus-t...
Happy pi day! 🥧
Postgres has a pi function: pi();
This can be used to calculate circular sizes and areas.
@pwramsey.bsky.social looks at pi in PostGIS today with a blog post on circular forms on PostGIS with a proof for the CIRCLELINESTRING shape.
www.crunchydata.com/blog/postgis...
SQL output can be messy. Psql options for formatting output. A handy option is
\pset border 2
This will add top and bottom borders with double lines (╔, ╚, ╤, ╧, etc.). A Header row bordered with a double-line separator . Each row is separated by single lines. Each column is clearly separated.
Congratulations to the Postgres community on PostgreSQL once again being named the DBMS of the year in 2024, for the second year in a row. db-engines.com/en/blog_post...
Great to see this ability for "creating processing pipelines for append-only streams of data...We believe it is a foundational building block for building IoT applications on PostgreSQL that should be available to everyone, similar to pg_cron, pg_parquet, and pg_partman."
There are many incremental processing solutions, but they seem to never quite do what I need.
I decided to build an extension that just keeps running the same command in Postgres with different parameters to do fast, reliable incremental data processing.
That's pg_incremental.
1/n
Importing files: For cases when data is loaded through remote repositories in s3, functions can look for new files and pg_incremental can load these.
Exporting files: For folks archiving or exporting data, like maybe individual partitions, pg_incremental can batch data into files and send them to a long term archive.
Interval pipelines: Similar to the rollups and aggregates, lots of folks are creating interval range data summaries. Either by date, week, or month pg_incremental can build your summary tables.