Slow Archival, Index Improvements, Node Metrics, Join or Subquery

In this episode of Scaling Postgres, we discuss slow WAL archival, index improvements coming for Postgres 13, accessing node metrics via SQL and deciding between a join or subquery.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Why PostgreSQL WAL Archival is Slow". This is from percona.com and they're talking about a problem where your WAL backs up and fills up the pg_wal_directory because it cannot archive those files fast enough. Now, they mention here that they've seen problems similar to this, but normally the culprits have been one failing WAL archival. So one, there's some sort of error in the archive command that's configured that it can't write the files to the destination, for example. Or two, there's an orphan replication slot that is preventing the old WAL files from being archived. But they've noticed a new problem that's happening more and more where you have a lot of WAL generation going on and it's basically causing a problem where it can't archive fast enough.

Now, they say this is generally due to two issues. One is the rapid generation of WAL segments. So as databases become larger and larger and there's more activity going on, you have a lot more WAL files being generated. Then the second issue, the archive command is actually configured to write to a remote storage location that has additional latency involved. Basically, those two issues can cause real problems with the archival process being able to keep up. I have witnessed this as well, but I've even seen it when there's so much WAL being generated, even with a local file copy that it couldn't keep up. So it's kind of shifted up to be a bottleneck of the whole system. So this is a very timely post for me in terms of seeing what they discuss here. Now, the next thing they cover is the nature of WAL archival being synchronous. So there are some weights that happen as a part of it.

They have this diagram here where you have a WAL segment that is finished. It writes a ready file in the archival status directory and then signals the archiver that there's a file it needs to pick up. So this archival process wakes up and goes to the list and it goes through every single ready file that's there looking for the oldest one to be archived and it calls one right after the other. Now, this causes a host of problems. So the first problem they mention here is quote "The method of finding out the oldest WAL segments one by one and archiving them one by one is not very efficient". So the more ready files that you have in the archive status directory, the longer that whole process will take. So basically the more WAL files you need to be archived, the slower it gets. So it's basically a downward spiral once you start lagging in this way. The second issue is that the archive command executes these commands and then waits for them to return before going on to archive the next file.

So basically it's a synchronous process and if there are any delays like you're trying to copy it to a remote location or even having difficulty writing efficiently to a local drive, it's going to slow down this process. In other words, it doesn't do this asynchronously, say, pass it off to a command and then look for the next file to archive. It actually waits for the whole process to finish. Then the third issue is that the archiver will wait for a second or more before reattempting if there's an issue. So this can further cause delays to be a part of it and they discuss some more issues with the implementation. But really what it's looking to me like this is an area ripe for optimization of Postgres itself on finding ways to just make this whole archival process more efficient. But in terms of an immediate solution, what they mention here is since the archive command just passes it off to another tool or utility or program, make that an asynchronous program that just accepts the input.

Then starts doing its process and then returns immediately. So here they mentioned that pgBackRest has such capability, so it uses multiple background workers and essentially once it gets called, it returns immediately. So the archive command can then do its next piece of work, but then it uses those processes to actually write the files, copy them to different locations, et cetera. So I found this to be a really great post discussing the problems and even some of these problems that I've seen. They do mention one solution, but I think long term this would be an area ripe for improvement in Postgres to make sure it can handle more transaction throughput. So if you're interested in that, check out this post.

The next piece of content is another post from percona.com. "Index Improvements in PostgreSQL 13". This is just basically a list of all the index improvements they've discovered in Postgres 13 in, I believe, Beta 3. So the first thing they mentioned, the big one is the deduplication of the B-tree indexes and they show how some of the improvements can get you space savings like up to a third in some cases. They discuss GiST and SP-Gist being able to handle box point distance lookups, and allowing gin indexes to more efficiently handle NOT restrictions. Then they cover a few other additions as well. So if you are looking for some index improvements coming in Postgres 13, check out this post.

The next post is "POSTGRESQL 13BETA3: B-TREE INDEX DEDUPLICATION". This is from rustprooflabs.com and this continues on the index improvements talking about deduplication. They looked at some examples from 12 and 13 and what they found is that of course when you have an index with a lot of unique values such as a serial, you're not going to get that much reduction. But for other types of index, depending upon your data, you can get significantly high reductions like this one's, greater than 70%, and they show some of the improvements in index sizes between 12 and 13 as well here. So if you want to learn more about this upcoming capability in Postgres 13, check out this post.

The next piece of content is "PostgreSQL Node Metrics: Don't Fly Blind". This is from blog.crunchydata.com and they're talking about a new extension they created called pgnodemx. I'm assuming that means Postgres node metrics extension because this is an extension. What it does is allow you to query system metrics from your operating system through SQL. It shows some of the different functionality related to it. They've made this a part of the Crunchy Data monitoring system that they have set up with a pgMonitor. It works of course, with their container suite and the Postgres operator. So if you want to learn more about this extension and its capabilities for retrieving system information, you can check out this post.

The next post, also from blog.crunchydata.com is "Joins or Subquery in PostgreSQL: Lessons Learned". So they were trying to do an exclusion query. For example, find all the records that didn't exist in another table or say a parent table. His first attempt not using a subquery or a join basically doesn't work. But then the proper join query listing here, where you do join one table to another and then you check the ID of the left outer join to table is NULL, like check the ID is NULL. That's the way that I tend to pull out these types of records. Now, he also did it as a subquery using where the ID is not in all the IDs from the other table. The performance wasn't quite as great, but still pretty fast. Then the next thing they tried was actually using an EXCEPT clause and they even set it up with the better syntax using a CTE and it was actually able to be faster than the left outer join, not looking for the ID. So down to less than six milliseconds. So, some interesting insights on different ways that you can pull rows that don't exist in a secondary table. So if you're interested in that, check out this post.

The next piece of content is "GENERATING A NORMAL DISTRIBUTION IN SQL". This is from cybertec-postgresql.com and he's basically generating a normal distribution of data for analysis purposes. He's actually using the table funk extension and he shows you how to run a command using the normal rand function to be able to generate your distribution. So if you're interested in generating a distribution of data, definitely check out this post.

The next piece of content is "Which partition contains a specific row in my PostgreSQL database?". This is from 2ndquadrant.com. So this is a very quick post, but it does exactly what it says and it uses table OIDs or table Object IDs. With this simple query, you can determine where a particular record is in a partition. Now you may be able to do this easily, as he says, with list or range partitioning, but with hash partitioning, this is probably the easiest way to find certain data and find out which partition is being targeted for it. So if you're interested in that, you can check out this post.

The next piece of content is "FDWs, curl and LIMIT". This is from blog.hagander.net. So this was an interesting post where he's using a file Foreign Data Wrapper to basically pull in a set of data from the Internet, looks like a CSV file, format it in terms of a table, and then query against it. Now we actually wanted to limit the number of rows returned, but that actually caused an error. But he found a way around it using some shell manipulations to be able to avoid the error and still pull it and be able to do a limit on the query. Now he says this is a very inefficient way to set up a Foreign Data Wrapper with a file on the internet because you're constantly going to be downloading the data with every query. So his solution for that is creating a materialized view. So basically whenever you refresh the materialized view, it re-downloads the data and repopulates this materialized view with all the data. That way you can add the index to have fresh data and not refresh it with every query. So it's an interesting use case, but if you're interested in that, you can check it out.

The next piece of content is "PG Phriday: 10 Things Postgres Could Improve - Part 3". This is from 2ndquadrant.com. This is following up on some of their posts, talking about things that Postgres could improve on, and they're talking about some of the disadvantages of the way that Postgres does MVCC, and that is that records that are updated or deleted aren't updated in place or deleted right now. They're just marked for future removal and a new record is added. So if you're going to update something, a new record is added, and the old one is marked as not there anymore. Now that's great because depending upon the visibility of that row, you can maintain that other database systems manage rollback segments or something similar where they do update in place, but then they record the fact that that older version still exists. So they talk about some of the benefits and the disadvantages of it.

Mainly, you have to go in and vacuum up all those rows and they go into a little bit of the freezing as well with the XID. They also talk about the index as the heap. So what they say is that Postgres keeps the indexes and the heaps separate and the previous post that was talking about things that Postgres can improve upon is that it would be advantageous to have the heap or the actual data ordered in some way. But right now Postgres doesn't really have any order. You can cluster it, but it doesn't maintain that order for long and you'd have to constantly recluster it again, which is a problem because this locks the entire table. So it's definitely not ideal to use unless you have some sort of data warehousing scenario where the system can be offline while tables are reclustered.

So he talks about the potential benefits of doing that and whether it's advantageous to leave Postgres as it is, having indexes and heaps completely separate. The next thing they discuss is inline compression and Postgres doesn't do that currently. But if you check out the previous episode of Scaling Postgres, we talked about a patch that actually enables you to define compression for certain tables and lets you choose the compression level and get benefits in terms of whether you want greater write efficiency or greater read efficiency. So it looks like this particular pain point that was mentioned might be looking to be resolved in a future version of Postgres, so that would be super. So if you're interested in learning more about some of the discussions on ways Postgres could improve, you can definitely check out this post.

The next piece of content is actually a presentation that was done by Bruce Momjian and it's "Postgres in the Cloud: The Hard Way". This is from momjian.us. He basically set up a Postgres instance only using the command line and setting it up on AWS. So you can see the outline of the steps he took here. He set up the AWS client interface, chose an AMI, set up the instance, and set up logging and configuring. In terms of setting up the instance, it looks like he also created security groups and roles and things of that nature. Installed Postgres, and then was able to connect to it. So if you want to see a command line-only way to set up Postgres on an AWS infrastructure, be sure to check out this presentation.

The next piece of content is actually a YouTube video and it's "Best Practices and Security with PostgreSQL". This is from the EnterpriseDB YouTube channel and they're talking about how to set up Postgres covering all sorts of different best practices with security in terms of authentication and access. So if you're interested in that, you can check out this webinar.

The next piece of content is "Waiting for PostgreSQL 14 - Add support for partition tables and indexes in REINDEX". This is from depesz.com. Now, what's interesting about this is that it also supports concurrent reindexing, which is fabulous. So for example, you can see here where you're able to index the parent table concurrently and it automatically goes in and reindexes all of the child tables. So definitely a great improvement coming to 14. You can check out this post if you want to learn more.

Next piece of content is "Tuning PostgreSQL on ZFS". This is from pg.uptrace.dev discussing the advantages of how to really shrink the size on disk with this. They say a quote here: "...you can achieve 3-5x compression ratio using LZ4...", and it compresses 1 data down to around 300 GB. So if you're thinking about using an alternative storage solution for Postgres, maybe you want to check out this blog post.

The last piece of content, the PostgreSQL person of the week is Regina Obe. So if you're interested in learning more about Regina and her contributions to PostgreSQL, definitely check out this blog post.

Slow Archival, Index Improvements, Node Metrics, Join or Subquery | Scaling Postgres 131

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript