background

Safety Systems, Failover Slots, Transaction ID Complications, Repartitioning | Scaling Postgres 118

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss safety systems, the purpose and existence of failover slots, complications with transaction IDs and how to repartition without downtime.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about safety systems, failover slots,transaction ID, complications and repartitioning. I'm Kristen Jameson, and this is scalingpostgres episode 118.

[00:00:18] One.

[00:00:23] All right, I hope you, your family million coworkers continue to do well in the times thatwe live in. Our first piece of content is safety systems can reduce safety. This is from Momgmus. And he typically does quick little blog posts that are a paragraph or two. This one's a little bitlonger. But what he had to say about safety systems I found particularly interesting. He'sbasically mentioned three instances where actually a safety system caused a problem. So forexample, caused a jet to crash, caused the Three Mile Island nuclear reactor meltdown, causedthe Chernobyl nuclear meltdown. So basically the safety systems introduced to protectsomething caused a problem. And he's looking at this from the perspective of PostgreSQL andhe mentions a few different safety systems here with regard to backups, error reporting,replication and connection pooling. And talking about backups, he looks at it through a fewdifferent lenses. How serious is the failure that the safety system is trying to prevent? High howlikely is the failure that the safety system is trying to prevent? High how reliable is the safetysystem? About a medium. And then what impact will failure of the safety system have on theoverall system? And it's relatively low. And he says, well, maybe you run out of disk space whilethe backup is occurring, for example. But usually you're not going to cause the system to godown by taking a backup. Maybe it has a little bit of a performance impact, but not very much.And what I find interesting about this and thinking about it is that what about other areas like hementioned replication. So if you're going with streaming replication, the one that causes the leastimpact is asynchronous replication. But then once you turn on synchronous replication, a safetysystem to ensure you don't lose any data, well now you have essentially two servers that alwaysneed to be operating to keep synchronous replication going. Maybe you can add more, but eachone you add, you're adding another point of failure. So this safety system, each time you add toit, you're introducing another potential source of failure because your database will stop workingif the synchronous replication stops working. So essentially the safety system causes a problemto happen. And then he talks about connection pooling and that maybe you want a connectionpooler. Well now that is a source of error. Well, maybe you want to introduce two connectionpoolers. Well now you've got more complexity you need to coordinate and deal with to makesure that those work. And the same thing happens with high availability solutions. So thosemanage the switch over from one database to another if there's an issue. But then what if there'sa failure in that system? Could that actually cause downtime? So not a long post, but definitelysomething to consider as you add these new more safety systems, do they really create a moresafe system?

[00:03:16] The next piece of content is failover slots for PostgreSQL, and this is fromsecondquader.com. Now, I will mention that I thought I recalled an earlier version of this post,but this is mentioned as being June Eigth 2020. So I don't know if this has been updated or ifthis is a new post, but I have seen a number of other articles talking about failover slots. And thisis the concept that a replication slot, when you fail over from a primary to a Replica, anyreplication slots do not get carried over, they stay with the old Replica. So you need to recreatethem on the new Replica. And this can cause problems with things like they mentioned here,logical decoding, where essentially you lose the place that you were at and you potentially haveto make adjustments to handle it. And they talk about the failover problem here with the fact thatthe slot does not get crossed over. And the issue is because you have to track not only theexistence of the slot, but where it is at in the wall file, so that when failover occurs, the slotknows exactly where it needs to be and it talks about a patch with regard to it. But like I said, Idon't know if this is an old post because I don't see any reference to a link to the patch orwhether it's a patch that's potentially coming in postgres 13 or a future --version 14. So I don't know the state of this, but even then it brings out an important thing tokeep in mind if you're using logical replication or logical decoding with postgres, the fact that theslots don't get copied over and you need to handle that case. So if you happen to know theexistence of a patch for postgres 13 or 14 that may address this, go ahead and let me know.

[00:05:00] The next piece of content is PG friday ten Things Postgres Could Improve part one,and this is from secondquader.com as well. Now, interesting, they say Ten things postgres couldimprove, but I don't see a listing of ten things, but they're listing four things that they're going toaddress in four parts. So I was a little confused by that. But basically this addresses part one oftheir things that could be improved and it covers transaction ID complications.

[00:05:33] So this is from a post that we covered in a previous episode of Scaling Postgres,talking about ten Things I Hate about postgres. So it's a little play on that. But they're focusingon the transaction ID complications. And this is the fact that a transaction ID is a 32 bit integerand that means there is a finite number of active transactions you can have. So you can run intoproblems if you're not actively vacuuming up your system. If you have long running transactions,you could run out of those active transactions essentially within tables you're working with, or forusing things like prepared transactions. So there's a few different scenarios where you can runout of these transactions. So you really want to be aware of these possibilities. Now, how canyou protect yourself for it? And really the number one way is to monitor it, to monitor the state ofit across all the tables and the database to make sure that you're not going to run out of thesetransaction IDs. And the second thing of course, is to make sure your table is well autovacuumed and that you're freezing these IDs so that they are available for reuse, as well asmake sure you don't have a lot of long running transactions, especially idle in transactionsessions. And in terms of the feature, they are looking towards things like maybe making thetransaction ID a 64 bit number just to increase the runway. Or alternatively, maybe a newstorage system like Zheep could offer an alternative. So if you want to learn about transaction IDcomplications, go ahead and check out this blog post.

[00:07:10] The next piece of content is Control runaway postgres queries with statementtimeout. This is from Crunchydata.com and it's talking about Statement Timeout, which is aconfiguration option that enables you to timeout statements that exceed a certain duration.

[00:07:25] Now, you can set this on the cluster by session. They are even talking about per user,per database. So this helps you avoid the kind of things that we were talking about in theprevious post of having too many long running transactions within your database system. Now,one thing he says he likes to do is set at the database level. Set the statement timeout to 60seconds so that no statement can exceed that duration within that database. And if you have torun a longer query, then in whatever session you're using, just do a set the statement timeout toa longer duration and then once you close out of that session, those settings will be forgotten.So if you want to learn more about the Statement Timeout configuration option, check out thisblog post.

[00:08:08] The next piece of content is repartitioning with logical replication in PostgreSQL 13.So this is from Agonder net and he's talking about a way to repartition a table from say, a dailypartitioning scheme to a weekly partitioning scheme, although you could migrate it one way orthe other. And he's looking to do this without downtime. So clearly you can remove access to thesystem and be able to do it that way. But how could you do it without downtime? Now he says,quote but first, a disclaimer. This is definitely not pretty, but apparently it does work as a quickhack. So he has one table that is partitioned daily and he inserted some data into it and then heset up a new table with a weekly partitioning scheme. Now, in order to do this, you actually needto work with two databases and create a publication from one table to another table in anotherdatabase, and then from that database, create a publication from that table in the secondarydatabase back to the first database. And he --goes through all the different code here. But the description of all that he's doing and kind ofhow it works is listed out in this eight step process here. And it's pretty much what I described.You replicate from one table in database One to a second table in the second database, andthen replicate from that to another table in database one, all using logical replication. And thatshould be able to repartition your data without encountering any downtime. So if you'reinterested in learning more about this technique, definitely encourage you to check out this blogpost.

[00:09:46] The next post is Multimaster Replication Solutions for PostgreSQL. This is fromProcona.com, and they're basically describing different solutions that do multimaster replication.They cover some key concepts with regard to database replication, synchronous replication,asynchronous replication, a single master replication, which is essentially what Postgres doesout of the box, only a single master solution. But then they go into multimaster. Talk about someof the pros, then of course, some of the cons with regard to it. I'll just mention some of the cons.The main disadvantage of it, of course, is complexity. Second, conflict resolution is very difficultbecause you can have simultaneous writes on more than one master at a time. So if someoneupdates the same data in the same table on two or three of the servers, which one wins? Andthen sometimes manual intervention is necessary in that case. And then they mentioned thedifferent solutions that they're aware of. The first is BDR, which is the bi directional replicationproduct of Second Quadrant. And they discuss that. They discuss XDB. They discussPostgreSQL Excel. PostgreSQL XC and XC. Two Ruby rep and Bucardo. So if you're interestedin looking into a multimaster database solution built on Postgres, definitely check out this blogpost and the different solutions offered.

[00:11:14] The next piece of content is composite type performance issues in PostgreSQL. Thisis from CyberTech postgresql.com. By composite types, they're talking about custom data typesthat you can create in Postgres. So for example, you can create a type. This one they've chosento call Person, and it includes an ID, a name and an income. And that one type is encompassingall of these different types. And you can store this in a single column. And you can see here thatthe type of that column is person when you create it with that data type of person. Now, onething they say here is be careful about database performance. Now, they didn't use thisexample. They actually moved to using the PG Stats Tuple because in terms of what it returnsfrom when you call it, it returns a composite type, but based on how you query it. Like if youquery it this way, it returns in almost 2 seconds. Whereas if you actually do a normal select witha from clause, it returns in 200 milliseconds. He says this is because when working with acomposite type, it actually goes through and calls this function for each column, apparently. Soit's just something interesting to be aware of is that depending on how you use composite types,you could run into a performance issue. So just be aware of this if you use them. Personally, Ihaven't used composite types in my database design. I tend to use non composite types. If I douse a composite type, I generally use JSON B fields, but go ahead and check out this blog postif you want to learn more.

[00:12:49] The next piece of content is transactions in PostgreSQL and their mechanism. So thisis talking about transactions, and it covers essentially normal transactions that are autocommitted by default in postgres. And they talk about that and show different ways it getsexecuted. And they start digging deep into how this actually works, and they query the table tolook at the different hidden fields and how that tracks and manages that transaction. They talkabout subtransactions that can happen within a primary transaction. They talk aboutmultitransactions that cover basically row level locks, and then finally they follow up with twophase commit transactions and how they work. So if you want to deep dive into the technicaldetails of how transactions work in postgres, definitely check out this blog post from Higo, CA.

[00:13:42] The next piece of content is a set of two posts developers Diary One and DevelopersDiary Two from Paul Ramsay at Clever Elephant CA. And he's talking about enhan --cements that they've been doing to postgres with regards to performance. So if you're wantingto learn more about the work he's been doing to improve performance of large objectmanipulation in PostGIS, definitely check out these two blog posts.

[00:14:12] The next piece of content is using the Crunchy PostgreSQL operator with PostGIS.This is from Crunchydata.com, and they're talking about exactly what it says using the CrunchyData Kubernetes operator to set up PostGIS in Kubernetes. So if you're interested in doing that,definitely check out this blog post.

[00:14:32] The next piece of content is PG Bouncer 1.14 is released. The main feature they listhere is that encrypted Scam Secrets in PG Bouncer can now be used for server side login. Sostoring plain text passwords in PG Bouncer is no longer necessary to be able to use Scram. Soif you're interested in this new feature, definitely check out this new version.

[00:14:55] The next piece of content is how to use Ada Boost machine learning model with twoUDA PostgreSQL and Orange part Six. So this is the part six of the post fromSecondquadrant.com that's been covering how to do the machine learning that they'rementioning here. So definitely check out this blog post to learn more.

[00:15:14] And the last piece of content is the PostgreSQL person of the Week is Olexi Kluken.So if you're interested in learning more about Olexi and his contributions to PostgreSQL,definitely check out this blog post that does it. For this episode of Scaling Postgres, you can getlinks to all the content mentioned in the Show Notes. Be sure to head over toScalingpostgres.com, where you can sign up to receive weekly notifications of each episode, oryou could subscribe via YouTube or itunes. Thanks. --

episode_image