background

PG 15 Beta 1 Release, Query Breakdown, Sort Performance, Prepared vs. Partitioned | Scaling Postgres 216

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the Beta 1 release of Postgres 15, optimizing a query by breaking it down, improvements to sort performance and prepared statements for partitioned tables.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PostgreSQL 15 Beta 1 Released!". This is from postgresql.org and they've released Beta 1 of Postgres 15 and this article talks about the different feature highlights from 15. So some of the main features from a developer perspective are being able to support the new SQL MERGE standard as an alternative for INSERT ON CONFLICT. They talked about the updated JSON support to match what is available in the SQL standard as well as some more regular expression functions. They go over some different performance enhancements, including areas that use more parallelization, as well as some enhancements to sorting and then some enhancements to backup, archiving, and compression, which I'm particularly interested in seeing some of these changes in terms of being able to support more than just gzip. But it looks like they've extended support for lz4 and zstd through a number of different Postgres utilities, as well as enhancements to logical replications, some administrative areas and, of course, security. So you can definitely review this as well as the draft release notes that are available here.

The next piece of content- "Postgres Query Optimization: LEFT JOIN vs UNION ALL". This is from crunchydata.com and they're talking about a query that a customer gave to them saying we're trying to make this more optimized. It's running against two tables that have millions of rows in them and it was talking about 2-3 seconds to run. And they were looking at this and trying to figure out how to optimize it. So it's using a COALESCE function to replace the case where there are no counts in Table B. They made it a zero, but yet they wanted to order it by the count. How this was structured, they really couldn't use an index to do this because this is dynamically created and they said even an expression index really wouldn't handle this. So they put it in their thinking caps to come up with another way to do it. And what they basically did is they broke it down into two different steps. So basically, go ahead and do the case where rows exist in Table B.

So go ahead and do a natural join to get where there are counts in Table B. And this is an example of that query here. So relatively simple. This will use an index on the table count so it can do the ordering by table count pretty efficiently. But then for the case where there are no rows in Table B, do what they call an anti-join. So you still join Table A to Table B, but then you are only looking for rows where there are no rows in Table B. So essentially you're just getting the set in A that does not exist in B and that by definition has a zero count. So you just do a static zero and each of these queries independently super fast. So then you can just put them together using a UNION ALL or you could even just query them separately, it doesn't really matter. But this gives it all to you in one query. And with this, being able to utilize the indexes that were on the tables, it basically executes in a little over a millisecond. So basically 2,000-3,000 times faster. So this is a great example of rethinking how you could potentially run a query to process it much more efficiently. So if you want to learn more about that, definitely check out this blog post.

The next piece of content- "Speeding up sort performance in Postgres 15". This is from citusdata.com and they're talking about some enhancements that had been done for sort performance and they really covered four areas of improvements that made it into Postgres 15. So the first is improvements on sorting a single column. So that means you're just selecting one column and you want to sort by it. They were able to do this more efficiently and get you about 25% better performance in the example that they did here, looking at 10,000 integer values. The second area is basically reducing memory consumption by using generation memory context. So basically, sorting within memory the way that Postgres normally does it, it always rounds the memory allocation up and that actually utilizes more memory than necessary.

But when adjusting it to conserve memory it also resulted in a speed boost. And of course, with this type of enhancement, it's based on the size of the data. So you can see the differences between 14 and 15. It kind of basically depends on what size the data is in terms of the amount of improvement. And you can see how there's basically a step function with Postgres 14 whereas with 15 it's more linear, the amount of performance. And the table going down here is the number of table columns that have been added so that gives a little bit more of a speed boost. The third area is adding specialized sort routines for common data types. So it looks like this is mostly with regard to integers to make things more efficient in terms of comparing.

And this gave a little bit more of a speed boost as well between 14 and 15, not much but a little bit better. They're saying between 4-6% better within memory sorts. And then the last area is replacing the polyphase merge algorithm with k-way merge. Now this was specifically targeted for "...larger scale sorts that exceed the size of work_mem significantly". You can see with a given work memory size if it's low, that performance definitely improved for Postgres 15 relative to 14 and they said it's like 43% faster for large sorts. So another set of improvements, but they do say they're not done yet. There's some more future work that needs to be done for performance and sorting as well in Postgres that they'll be working on. But if you want to learn more about these areas, definitely check out this blog post.

The next piece of content- "Postgres: on performance of prepared statements with partitioning". This is from amitlan.com and he's talking about the issue where you can send a PREPARE statement to prepare a query and then you send a secondary EXECUTE to actually execute the query. So this basically allows it to do a parse and pre-plan before executing the query. The intent in this is to say you want to send ten different queries that are essentially identical, maybe the parameters change if you prepare it first, then it can operate much more efficiently. He shows an example here using pgbench where he's using a protocol=simple so it's not using the PREPARE and then EXECUTE, it's just running the command. And the average query latency is 0.58 milliseconds with a TPS in this example of 17,000.

Well, when you go with protocol=prepared, so it prepares it and then executes those queries or statements that are sent to the database. The latency drops to 0.31, which is about half the latency. As you would expect. The TPS goes up to 32,000 so essentially doubles your transactions per second. So it is advantageous to use this. However, the issue comes with partition tables because the database won't know what parameters are going to be expected. It can't really narrow down what partitions need to be hit. So partition pruning is not something that can be done at planning time, it has to be done at runtime. So that makes things a bit slower. And then there were some inefficiencies he mentions with regard to doing the runtime pruning of the partition tables and basically, the more partition tables you have, the longer the latency of the queries.

However, you can do what they call a custom plan and what that basically means is it has to do the full parsing and planning and it doesn't take advantage of PREPARE and then EXECUTE. So basically for partition tables that's what it basically does. But he's looked into a patch where you actually can do the PREPARE and EXECUTE on partition tables and he made the runtime partitioning more efficient, I believe. And you could see here an appreciable improvement in performance from the patched version compared to just not using PREPARE and EXECUTE. So this isn't going to be in Postgres 15, but potentially in Postgres 16. We could look for this type of improvement with partition tables if you use that to PREPARE and EXECUTE.

But if you want to learn more, this is actually the post that was reviewed in "5mins of Postgres E18: Partition pruning, prepared statements and generic vs custom query plans". So if you want to learn more about this, definitely check out this episode from pganalyze.com.

Next piece of content- "Solving PostgreSQL's planner issues due to statistics missing chunks". This is from ongres.com. They're talking about an issue where basically they had corruption. So they had a corruption on a TOAST table. So they got an error that said "ERROR: missing chunk number 0 for toast value" something "in pg_oast_2619". And looking it up, that was actually the pg_statistics table. So the TOAST table for that table was corrupt. So presumably they had something listed here like a disk failure, a power loss, some sort of hardware failure, or maybe fsync was turned off.

But basically, they have this issue and how did they deal with it? So their solution was, well, this just stores statistics, so maybe we can just delete all of the data in the table and then run ANALYZE to repopulate the statistics. So that's exactly what they did and it ended up working. Now normally you would not want to do this to a systems table, but with a specific table, it seems like this would be an acceptable way to do it. They did give an example of where you potentially wanted to just delete rows from the statistics table that you know are corrupt if you happen to know which particular table, and then you could just analyze that table to add the statistics back. However, this is an interesting technique to address this type of table corruption. But if you want to learn more, definitely check out this blog post.

Next piece of content- "Unfinished Business with Postgres". This is from craigkerstiens.com and this is a very interesting post. It's relatively long, but it's talking about his experience working at Heroku over many years. He's now with Crunchy Data, but he's talking about the impact that Heroku Postgres had on different utilities as well as different enhancements that were made to Postgres over the years, and talking a little bit about hosting Postgres. So if you want a little bit of an insight into some history with regard to Postgres, definitely an interesting blog post to check out.

Next piece of content- "Adventures in collating text in Linux, Docker and Postgres". This is from ptc-it.de and he's talking about an issue he was running into where text was being sorted differently in different environments. He checked all sorts of different stuff here, but ultimately it came down to the coalition that was being used in the Postgres that was being used in each environment. One was set to a different coalition and it was their Docker container at GitLab that they wanted to fix. So he went through the process of setting the coalition exactly the way he wanted it in the docker container that they're using. He went through the process of getting that set appropriately so that the data now sorts consistently. So if you want to learn more, you can check out this blog post.

Next piece of content- "My experience with exporting Postgres RDS partitioned tables to s3". This is from geopet85.github.io, and he's using the exporter for AWS RDS databases to s3. While it was working for their technique of trying to export them to parquet files, when it came to partition tables, it was doing the child tables and the parent tables, which you would want one or the other, not both. And he basically had a struggle with it. So if you're using RDS exports to send to s3, you may be interested in checking out some of the issues he encountered when trying to do that.

Next piece of content- "Instant Heatmap with pg_featureserve". This is from crunchydata.com. Now, this is a web service that allows you to access PostGIS and, by extension, Postgres data for rendering geoinformation system information over the web. And in this example, they're using it to do a heat map. So if you want to learn how they get it set up and working, you can definitely check out this blog post.

The last piece of content, the PostgreSQL person of the week is Haki Benita. So if you're interested in learning more about Haki and his contributions to Postgres, definitely check out this blog post.

episode_image