background

pg_failover_slots, Standby Logical Decoding, Trusted Language Extensions, Postgres Package Manager | Scaling Postgres 262

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the pg_failover_slots extension, how PG16 allows logical decoding on standbys, what are trusted language extensions and how a package manager has been built for them.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PG Failover Slots (pg_failover_slots)". This is from postgresql.org in the news section. And this is a new extension where your logical replication slots can now fail over to a replica, which is awesome. Because imagine you have a primary server and a replica server and you're doing physical streaming replication from the primary to the replica, but then something happens to the primary and you need to fail over to the replica and essentially promote it.

Well if you do that, the logical replication slots that may be existing on the primary don't get carried over to the replica. So you may have to restart logical replication from scratch again. But now with this new extension, it looks like they can fail over. And you can see that what the extension does is it copies any missing slots from the primary to a standby and removes any slots from the standby that are not found on the primary, so tries to keep them in sync.

It periodically synchronizes the position of slots on standby based on the primary, and lastly, ensures that the selected standbys receive data before any of the logical slots WAL senders can send data to consumers. So this is how these fail-over slots are trying to keep everything in sync essentially between two servers so that if there's a fail-over event, that promoted replica has its replication slots at the identical position of the former primary.

So this is super interesting and definitely going to be checking it out. Although the question I have is why is this an extension and will this make it into mainline Postgres at some point? Maybe this is a way to test it out and make sure it's working well but hopefully, this feature will eventually make its way into Postgres as well. But if you want to learn more, definitely check out these pieces of content.

Next piece of content- "Postgres 16 highlight: Logical decoding on standby". This is from bdrouvot.github.io. And historically, you cannot do logical replication or logical decoding from a standby. And he shows you if you try to create a logical replication slot on the standby, it fails. But with this commit to 16, you can now create it.

So he goes through the process of exercising this new feature and how it works. So this definitely gives a lot more options when you're working with logical replication, maybe you want to stream physical changes to a replica but then actually logically replicate from that. But if you want to learn more, definitely check out this blog post.

Next piece of content- "Trusted Language Extensions for Postgres". This is from supabase.com. And we talked a little bit about this in a previous episode of Scaling Postgres where AWS released an extension called pg_tle for Trusted Language Extension. And with this, means that once this is installed, a Postgres extension written in a trusted language can be installed on the server that has the pg_tle extension installed.

And essentially a trusted language has protections to not access the file system or maybe certain areas and memory. And this is intended for cloud providers of database services to be able to install these extensions without harming the neighbors on shared infrastructure. And it says you can install this extension from any database client using the function pgtle.install_extension().

Now what's interesting about this is the next post, also from Supabase, which is the "dbdev: PostgreSQL Package Manager''. So for these extensions that are using trusted languages, essentially trusted language extensions, they created a package manager for this at the site dbdev. And enables you to easily connect up and use packages from it. And the intent is to essentially be like the nmp for JavaScript, pip for Python or cargo for Rust.

Now, we'll eventually see where this goes but it's definitely pretty interesting. So once you find a package of interest, you can use this command select db_dev.install. And then create the extension in the database with a particular version that you want. But if you want to learn more, definitely check out this blog post.

Next piece of content- "PostgreSQL and SQL:2023". This is from peter.eisentraut.org. And this is a follow on from his post talking about the new SQL 2023 standard. And here he compares all those different features to where Postgres is, with regard to their support of them. So you could see a lot of them have been in place for a while, some of them are more recent versions like 14 or 15, and are on track to be in 16.

We'll ultimately have to see. And then a few more that he thinks are potentially good for the future and one that probably is not going to be done- String-base JSON. In his opinion, it isn't worth doing. But this is a good comparison to see where Postgres stands in relation to the new 2023 standard. So check this out if you're interested in that.

Next piece of content- "Writing Data from Postgres to MongoDB". This is from crunchydata.com. And their use case was that they had a purchase API that kept sending data to a Postgres database and then periodically, it aggregated it into a document, a JSON document presumably, and stored it in Mongo for users to be able to retrieve receipts, for example. So with this use case, basically, they set up a foreign data wrapper so that Postgres can speak MongoDB and then serve data into it.

So they go through the process of setting up the tables in Postgres, showing how to generate the JSON that will be stored in Mongo. Creating the foreign table as well as a trigger to keep that updated and sync existing data. So you can definitely check this blog post to see how that is all laid out. But of course, I'm thinking about why would I want to use Monk when I could probably just store it in a JSONB field?

And you're already using a trigger, so you could use a trigger to create that JSONB field. Or if you didn't want to use triggers, you could use a materialized view, perhaps, and refresh it on a periodic basis. So I don't know if I would necessarily use MongoDB for this use case, but this shows a way of how the MongoDB foreign data wrapper could be used.

Next piece of content- "Who Contributed to PostgreSQL Development in 2022?". This is from rhass.blogspot.com. And he's listing all the different code statistics for Postgres development. As you can see, this list is by top committers and you can see Tom is in that top position here. Then he has the list by lines of code. So lines of code that were changed but they weren't necessarily the principal author of it, here you can see Alvaro is number one.

And then finally, here are people who will send at least a hundred emails to the pgsql-hackers list in 2022. And you can see Tom is again on top in this one. But I do find it super interesting to see all the different names here and relate that to presentations I've seen or blog posts I read or even stack overflow posts. Frequently, I see Tom Lane in a lot of answers in stack overflow posts. But definitely check this post out to see who contributed to Postgres.

Next piece of content- "Efficiently Delete Old Rows with Partitions". This is from sqlfordevs.com. And he's talking about the use case of using partitions for data you only need to retain for a specific period of time and then it can be deleted. Well, the easiest way to do that is with partition tables. So in this case here, he's using audits, which is a perfect use case.

Generally after a period of time, you no longer need that audit data and he's partitioning by year and by month. So when the data has expired, you can easily just detach the partition, and then drop the table. And all that data's gone. If you weren't using partitions, you'd actually have to delete the rows. Now that can take a long time. And then the worst part is your data file won't really shrink that much.

So now you have all this dead space in the file. Now, you can reclaim it as, he says here, using vacuum full, but that locks the whole table for writes and reads. So then you'd have to use something like pg_repack to essentially rewrite the whole table to reclaim space. So generally, it's easier just to set up partition tables for this type of use case and he explains how to do it here.

Next piece of content- "Can there be too many partitions?". This is from kmoppel.github.io. And he did tests of partitions ranging from no partitions to over 4,000 partitions to see how the performance changed. And what's interesting is that the plan time only doubled, but really that's an insignificant part of most queries. And the mean execution time for most partitions did drop down a little bit but again not significantly different from no partitions.

Now many of his queries were key bases so maybe different types of queries would have different results. But what's interesting is the performance didn't really change all that much up to 4,000 partitions. So based on this data, I would say you could probably have at least 4,000 partitions with no harmful effects.

But of course, the question is what happens when you go to 8,000 or 16,000 or 32,000, which I probably wouldn't do. I usually like to keep it in and around no more than the thousand mark but it looks like you can go to 4,000 quite easily with recent versions of Postgres. But if you want to learn more check out this blog post.

Next piece of content- "Waiting for Postgres 16: Buffer cache hit ratio and I/O times in pg_stat_io". This is from pganalyze.com. And this is the next episode of "Five minutes of Postgres" with Lucas. And he's discussing the new view, pg_stat_io, that's coming in 16.

And he says there have been two recent enhancements to it and one of the things this enhancement does is allows you to better measure the buffer cache hit ratio to get a gauge on how soon you would potentially need to upgrade your hardware. But basically, you get a lot more detail with the I/O of the given system. So definitely check out his episode if you want to learn more about that.

Next piece of content- "EXPLAIN (GENERIC_PLAN): NEW IN POSTGRESQL 16". This is from cybertec-postgresql.com. And he's talking about the new capability in EXPLAIN where you can actually specify a generic plan and you can give it parameter placeholders such as this to better understand statements that are prepared statements. So it allows you to get a good explanation plan. And he discusses a few of the limitations of this as well. So if you are interested in that, definitely check out this blog post.

Next piece of content- "UNDERSCORES IN NUMERIC CONSTANTS IN POSTGRESQL 16". And this is from cybertec-postgresql.com. And this was mentioned in a previous post by someone else on Scaling Postgres. But this goes over that feature again where instead of writing these really long numerics, now you can put underscores between them to give you separation and make it easier to read. It also helps in being more accurate, making sure you're specifying the correct amount. So if you're interested in that definitely check out this blog post.

Next piece of content- "PostgreSQL for the SQL Server DBA: The First Four Settings to Check". This is from softwareandbooze.com. And if you're someone who uses Microsoft SQL server and is moving to Postgres, totally check out this blog post

Next piece of content- "How To Set Up pgTAP for Writing PostgreSQL Database Unit Tests". This is from endpointdev.com. And pgTAP is an extension that allows you to write unit tests in Postgres. So you can test and verify things like, table for example your table exists, your column exists, or your function returns, and expect a result. So check this blog post if you want to learn more about how to set that up and get that working.

Next piece of content- "Postgres- Fun with LWLocks". This is from paquier.xyz. This is definitely an internals-focused post where he talks about the internal functions in Postgres for handling lightweight locks. So if you're interested in getting into the details of Postgres, definitely check out this blog post.

Next piece of content- "Waiting for PostgreSQL 16- Add array_sample() and array_shuffle() functions". This is from depesz.com. And he's talking about where you can now sample items from an array, as well as take the item as an array and shuffle them around. Which he said could be useful for Monte Carlo applications.

Next piece of content- "ACCURACY OF GEOMETRY DATA IN POSTGIS''. This is from blog.rustprooflabs.com. And they're talking about geometric accuracy and how latitude has a significant effect on it. And the further you get away from the equator, the more errors that can be introduced into your calculations and how to actually deal with them. So if you're interested in learning more, definitely check out this blog post.

Next piece of content- "using Encryption-at-Rest for PostgreSQL in Kubernetes". This is from percona.com. And specifically, they're talking about using their Percona operator to be able to set up Postgres on Kubernetes and handling an encryption at rest requirement. So if that's of interest to you, you can check out this blog post.

Next piece of content. There was another episode of Postgres FM last week, this one was "Read-only considerations". So basically, these are use cases where databases are predominantly used in a read-only state and maybe they're refreshed once a night. So if you're interested in that type of discussion, you can check out their audio or YouTube video here as well.

Next piece of content, the PostgreSQL person of the week is David Christensen. If you're interested in learning more about David and his contributions to Postgres, definitely check out this blog post.

And the last piece of content is we did have another episode of the Rubber Duck Dev Show this past Thursday afternoon. this one we discussed "Leaving the Cloud", and this is a reflection on something that David Heinemeier Hansson and his company 37signals are doing in terms of leaving one of the major cloud providers and co-locating their own servers and then setting up their own infrastructure away from one of the main cloud providers. So if you're interested in that type of discussion, we welcome you to check out our show.

episode_image