Monitoring Progress, More SQL, Replication Slot Failover, Postgres Contributors

In this episode of Scaling Postgres, we discuss how to monitor DML & DDL progress, using more SQL, one way to handle replication slot fail-over, and recent Postgres contributors.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Progress bar for Postgres queries - let's dive deeper". This is from postgres.ai, and he's talking about a previous post that he read called "Query Progress Bar". And this is something I considered putting on Scaling Postgres, but it was actually this quote that he mentions here- "Don't use this in prod". This is why I ultimately decided not to showcase it last week. Plus there was tons of content because it was a couple of weeks of content due to the holiday break. But he reviewed it and he kind of did his own riff or his own take on it, talking about how to monitor Postgres. Now, first, he goes into the thing to consider that if you're going to be using something like this against production, you need to split the UPDATES, DELETES, INSERTS, or whatever you're doing into batches.

Because you don't want anything to take too long and impact vacuum or impact locking or things of that nature. His gauge he likes to use is 1 second. So whatever amount of records you choose to make changes to, keep any updates to 1 second. So batch it in 1-second increments to hopefully avoid excessive locking and also vacuuming, although, with a vacuum, it still may have difficulty keeping up depending on how quickly you're doing these batches. So sometimes when I'm doing mass updates on a very large database, like terabytes in size, it is important to periodically do vacuums and analyses during any sort of automated process. But anyway, he takes the examples from Brian's post that he mentions here, and he has devised his own way of doing this type of progress bar that looks something like this, and he does it in a batched fashion.

So I found this particularly interesting and he shows an animated demonstration of how it works. Now, going forward, I'm not sure I would use this technique because historically I've created Bash Scripts to do it, so I have an existing template I use to be able to do a lot of data changes in batches. Or you can either drop down and use the language, say that an application is using it could be Python or Ruby, and use that to do the updates and just lean on the SQL to do the actual deleting and updating part. Or with more recent versions that support procedures, you could devise a procedure to do it or even call a PL/SQL function to do certain things. But this is an interesting way to do it, essentially just using a CTE to give you a little mini progress bar.

Now, he does mention something that could be useful for non-production usage and that is pg_query_state, which is, I think, a lesser-known extension, but he says you can do it if you have your own Postgres instance to use, so that's a possibility. He also mentioned monitoring DDL queries. Now, this is super important when you're doing things like creating indexes concurrently, particularly on terabyte-sized tables, they can take a very long time, and being able to track the status of it can be important. Now they do have, in version 12, the pg_stat_progess_create_index and he actually has what looks to be a very useful query to be able to track that and actually give you a status of it.

So I'm less worried about this because it is just doing pure SELECTs against system views essentially, and pg_stat_activity to give you this information of the progress and it just uses a watch command to keep up and update the status on a periodic basis. So this could be super useful. He mentioned some other long-running commands that you could use and have links to progress reporting because they continue to add more capabilities to be able to monitor different jobs in Postgres. So I found this to be a super useful post and if you're interested, you can go ahead and check it out.

Related to that is the next piece of content, which is "Quick snapshots on Postgres active SQL from pg_stat_statements". They also have a companion post looking at a snapshot on pg_stat_activity. So basically, it uses a CTE and some techniques to be able to give you a 30-second snapshot of what are the active queries that are running against the system. And they do it for pg_stat_activity and pg_stat_statements in each of these two posts. Now, what's interesting is that version 14 now offers query_id. So I wonder if some of these queries could be made easier by using the query_id to link information between pg_stat_activity and pg_stat_statements. There have been some previous posts on query_id that we showcased on a previous episode of Scaling Postgres, so I might prefer doing that technique to some of these examples here, but this is available if you want to learn more.

The next piece of content- "More SQL, Less Code, With PostgreSQL". This is from pgdash.io. They're showcasing different SQL you can use to actually minimize the amount of code you run or have to write in your app, or just send less traffic back and forth between, say, your app and your database. So the first feature they showcase is an Upsert. Basically, try to do an INSERT and if the record already exists, UPDATE it instead. So this is a capability that has existed since Postgres 9.5. So if you're looking to do an INSERT or just UPDATE it if it already exists, definitely use Upsert because again, it minimizes network traffic and with a single statement you can easily do it rather than having to do a SELECT, see if the value is shared, do an INSERT, and try to avoid race conditions.

You can just do it in one go with a statement. The next one is insert returning. So when you do an INSERT, you can actually ask it to return certain values. One of the big ones that tends to happen is when you have auto-generated IDs, like through a serial data type or an identity data type. It will return to you what those IDs were inserted. So this could be super useful. Again it minimizes round trips because you're sending one INSERT statement but you're getting information back on the record being inserted so you don't have to do a follow-up select. The next one is any in a set. So if you have a case where you have a query where you're using an IN, where you're checking different values, there's another way you could do it. You could just send it an array and just use ANY and within parenthesis, send an array of values to check if it's there.

This way you can insert as many values as you want to in the array. Although they do say this is better for a limited set of values, not a ton of values. You may want to use a different technique if you're going to be doing this with a ton of values. The next one is moving rows between tables. So this is something you can do using a CTE in one go by deleting rows from a table and then inserting those deleted rows into a new table. They show you a way of doing that here. The next feature they mentioned is coalesce which enables you to replace NULL values with some other value. It could be with a blank for a string or it could be a zero for an integer or even any other particular type of value. So it enables you to more easily handle NULLs and it's a function that exists right in the database. The next feature they mentioned is CASE statements, or these could also be filters and it basically is an if then for the information to return from the database.

So you can choose to do this in SQL versus your own code. The next is support of UNIONs so combining different result sets from queries into one. The next feature is select DISTINCT ON and this can be convenient when you only want to return one row from a join table, like for example getting the most recent purchase from a particular customer. You can use DISTINCT ON to do something like that. Then the last thing they mention is actually using numbers in group by or order clauses. So especially when you have something like a case statement, you can choose to use numbers and it makes it much more convenient than having to repeat the exact case statement that exists in the select area. So if you want to learn more about these techniques, definitely check out this blog post.

Next piece of content- "How Patroni Addresses the Problem of the Logical Replication Slot Failover in a PostgreSQL Cluster". This is from percona.com. Now I chose this post because this tends to be a big problem with replication slots. The problem is that the replication slot that exists on a primary does not get copied over and its state does not get copied over to a replica when it is promoted. I think this is a pretty big feature gap for Postgres. So this can be a problem for people where you have a failover event, but then you forget to do the replication slot or you don't have a process in place to recreate that replication slot and have it in a particular state. Now, ideally, this feature would exist in Postgres itself and I hope it will eventually get there.

But in the meantime, it seems the Patroni project, which has high availability, has devised a way to handle this replication slot failover. So basically what it does is it manages Patroni, the replication slots for Postgres. And when you say you want to create a replication slot, it actually creates it on the primary, but it also creates it at the same time on each of the replicas. Then it has a process where it keeps the replication slot at the correct location. So when it advances on the primary, it advances on the replica, and before a failover, it makes sure that those are in the same state. So basically you can presumably seamlessly transition from a primary to a replica and that replica would have a slot that's in the proper location and you essentially wouldn't lose your replication slot. So definitely good to have, but I prefer it to be a feature that's added to Postgres in the future. But if you want to learn more, you can check out this post.

The next piece of content- "Who Contributed to PostgreSQL Development in 2020 and 2021?". This is from Robert Haas at rhaas.blogspot.com and he's presenting a list of all the contributors' lines, adjusted percentage of the lines, and the number of commits across 2000 and 2021. So thank you to everyone who contributed to Postgres and has made it what it is today. But if you want to look at some of the contributors, you can definitely check out this post.

The next piece of content- "A Postgres Primer for Oracle DBAs". This is from blog.crunchydata.com and this is a pretty simple post for those on Oracle looking to go to Postgres and what are the top-line things you should be thinking about? They compare the differences in terms of nomenclature between what are tablespaces, users versus roles, and what about the procedural language. How do you do an export and an import? How do you do graphical administration if it exists? Although there are multiple tools and tools, pgAdmin is just one. But they also cover important things to think about, such as how you handle backups and the different ways to do it, and even some different specific tools like pgBackRest. As well as some things to take into account with security and user access. So if you are on Oracle and looking to migrate to Postgres, you may find interest in this post.

Next piece of content- "Why give a conference talk (or, why give a Postgres talk)". This is from citusdata.com, and there is an upcoming event, Citis Con, an event for Postgres. And they're looking for new speakers and they're advocating as to why it would be important to give a talk because basically, this could prove beneficial for both you and the organization that you support. They give some ideas of some subject topics you may want to cover and how it is highly encouraged, even if you're new, to Postgres, to give a talk about something new that you've learned and to present a proposal to them. So if you're interested in potentially giving a talk and want to learn more about ways to go about it, you can definitely check out this blog post.

The next piece of content- "PostGIS 3.2 New and Improved". This is from blog.crunchydata.com. They're talking about new features that have been added to the just released PostGIS 3.2 that, to my understanding, works best on Postgres 14. But here specifically, they're talking about differences in the Raster algorithms, discussions about Raster cloud access, faster/better validity, and faster just index builds. But if you're interested in learning more about that, you can check out this blog post.

The next piece of content- "Creating a Standby Cluster With the Percona Distribution for PostgreSQL Operator". This is from percona.com and they're talking about using their operator, I believe it's a Kubernetes operator, and setting up basically a standby disaster recovery site that is separate from your primary cluster. And how do you do that? So they walk through the process of setting that up using their operator. So if you're interested in that, you can check out this blog post.

The next piece of content. The PostgreSQL person of the week is Emil Shkolnik. If you're interested in learning more about Emil and his contributions to Postgres, definitely check out this blog post.

The last piece of content. We did have an episode of The Rubber Duck Dev Show this past Wednesday where we talked about the JavaScript options that are available in Rails 7. So if you're interested in a long-form podcast discussion format, you can definitely check out this content. This Wednesday's upcoming show is actually going to be talking about the dark side of open source. So if that content interests you, we welcome you to join our show.

Monitoring Progress, More SQL, Replication Slot Failover, Postgres Contributors | Scaling Postgres 198

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript