background

Recovery Configuration, Alter System, Transaction Isolation, Temp Table Vacuum | Scaling Postgres 85

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss changes to recovery configuration, the alter system command, transaction isolation and vacuum for temp tables.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about recovery configuration, altersystem transaction isolation in temp table vacuum. I'm Kristen Jameson, and this is scalingpostgres episode 85.

[00:00:16] You all right? I hope you're having a great week. The first piece of content isreplication configuration changes in PostgreSQL twelve and this is from secondquadrant.com.And basically with postgres twelve the way you set up recovery as well as replication haschanged because the recovery.com file is no longer used. In fact, if you have one in the directorywhen you try to start the PostgreSQL twelve cluster it's going to not start and give you a fatalerror using recovery command file recovery.com is not supported and in place of this file insteadit's using two signal files. One is that the cluster should be a standby and it has a standby signalcluster in the data directory I believe, or it has a recovery signal file which means it's starting upin a targeted recovery mode and if both files are present the standby signal file takesprecedence. And it mentions some of the things here that when the standby is promoted, thestandby signal file is removed and if a point in time recovery is taking place, the recovery signalwill be removed once the recovery target is reached. And of course what this also means is thatif neither of these files are present the cluster will start up acting as a primary database server.Now pretty much all of the configuration options that used to exist in recovery.com are now willbe in the postgresql.com file except for two things that they mentioned here. One, the standbymode is no longer used. They're using the two different file signal types now. And secondly, thetrigger file has been renamed to promote trigger file. So it's the file that will, when exists,promote this to the primary. Now the advantage of having these commands in thepostgresql.com file is that now you can do things such as alter the configuration and doing areload, whereas usually changes to the recovery.com file required a full restart. Now what's greatabout this post it also mentions some gotchas with this and the first one is that alter systemsettings take priority. So basically if someone changes the configuration by using alter system,what it actually does is alter a PostgreSQL auto.com file that actually exists in the data directory.Like I tend to use Ubuntu and the PostgreSQL comp file is located in the etc PostgreSQLdirectory and there's a folder structure for each cluster that's on that system whereas this autofile exists in the data directory and basically you're not supposed to modify it. And these changestake precedence over settings in the PostgreSQL comp file. So basically when you use the altersystem command it actually makes those changes into this separate file. So it's just somethingto keep in mind that settings in this file take precedence over ones in the PostgreSQL comp fileso you may run into confusion as different things are going to be configured in different areasand why this is important. I'll get into some of the it looks like some of the utilities may be storingsome settings here. The next gotcha is that replication configuration settings may be presenteven on primary servers. So because this is in the postgresql.com file it could have like primaryconnection info setting here could be in the primary but it's not a replica. Basically you have torely upon the standby files to identify if it's a replica or not or query a system table. The nextgotcha is no canonical location to write configuration settings basically since it's part of thesystem configuration now it could be in multiple files or multiple locations whereas before it wasjust known to be in one file. And the other thing they also mentioned is of course that the lastconfiguration parameter red takes priority. And again, we're referencing again the PostgreSQLauto.com file that is modified by Altersystem and other utilities that they mentioned here such asPG based Backup or Rep Manager, which is a backup Replication Manager, I believe by secondquadrant another gotcha is a risk of signal file confusion. So basically you need to just be awareof these two files in their existence and their purpose. And the last gotcha that's really mostly justa configuration change is that only one parameter from the recovery target family may bespecified and they mentioned that in a PostgreSQL eleven and earli --er the last instance of these parameters was used. However this time it system won't start upunless there's one and only one which I think is a safer setting for the recovery target but it's justa change you need to keep in mind. Now related to this post there's actually three other poststhat talk about these configuration changes. The next one is by Procona.com and its title is howto set up streaming Replication in PostgreSQL twelve. Now again they talk a lot about thechanges that the fact that recovery.com has gone away and you need to do your configurationslightly differently to do replication. So they are doing this on a CentOS system? I believe so.They're showing the different ways to get the change the parameters you need to set replication,create a user, update the Pghba file so that you can connect to the primary. And then they do aPG based backup to the primary. And they're actually using the R option, which we'll look at in asecond here. Basically, that writes the appropriate files to be able to start at the system in therecovery mode. So for example using this R option the Pgbase backup utility actually writes tothe PostgreSQL auto.com file that was mentioned before that's in the data directory PostgreSQLand it adds the listen addresses and the primary connection info that was input with the Pgbasebackup command here. So this is a pretty good post about showing you how to walk throughand talks a little bit more about the changes that have come with PostgreSQL twelve with regardto recovery and also replication. The only thing that I saw here that it kind of gave me pause isthe fact that they actually did an echo to write to the PostgreSQL auto.com file when it explicitlysays in the PostgreSQL auto.com file is do not edit this file manually, it will be overwritten by theAlter system command. So they did overwrite it here. I would probably not do this because itseems to be not the best practice but just something to keep in mind.

[00:06:50] Next post also related to this change. So apparently with people trying twelve theymay be running into this issue. This is where is my recovery.com file in PostgreSQL versiontwelve and this is from Postgres Rocks Enterprisedb.com and his answer is the short answer isit's gone. And again he runs through the different process, why it's gone again talking about howPG based backup with Er option and how it works differently. So again, another resource to goand read to help you get up to speed before you're implementing twelve and needing to changeyour recovery processes potentially as well as your replication processes. With this I haven'tused the Altra system command a lot. Typically what I do is modify the postgresql.com file andthen do a reload of the system but this may see more increased usage so it would probably be agood practice. And again, this command has been around for a long long time. But to get familiarwith the Alter system command and how it works as well as looking at the PG based backupcommand. So in versions eleven and prior the R option, the capital R option or theWriterecovery.com option was write a minimalrecovery.com file, well those no longer exist and inversion twelve that same option in PG based backup actually does a create a standby signal fileand append connection settings to the PostgreSQL auto.com file. And this also adds thereplication slot if Pgbase backup is using a replication slot. So again, just some changes andsome gotchas to be aware of with regard to PostgreSQL twelve in terms of recovery as well asreplication.

[00:08:35] The next post is Time and Relative Dimension in Space and this is from Pgdba.organd this is a great post.

[00:08:46] He explains things very well, talking about essentially transaction isolation levels.First. He covers MVCC. So it's MultiVersion concurrency control how PostgreSQL handlesconcurrent access to a lot of the different data. And then what each individual can see, he goesinto how the XID is assigned once a transaction started. And that along with essentially hiddencolumns and tables that define what XID it was created or deleted at. Determine what you cansee in the database based upon the version that you're using. And then a part of this. And thiswas a great explanation of how this works. So if you're not familiar with that, definitely check thisout. And then he follows up with a transaction isolation and how essentially this is the SQLstandard defines four lev --els of transaction isolation in terms of what's possible. Like at the lowest level, it's possible toget dirty reads when a transaction can access the data written by a concurrent not committedtransaction, when it's possible to get a non repeatable read, a phantom read and a serializationanomaly. So all sorts of different possibilities. And then what he did is created this table here thatcompares these possibilities with the isolation level options available in PostgreSQL. Soessentially with a read uncommitted everything but a dirty read is possible. Essentially withPostgreSQL it's not possible to get dirty reads. And I should mention that the default state isread committed for PostgreSQL. So if you do nothing, it's going to be recommitted. So it ispossible to get non repeatable reads. You do a select, you get a value, another transactionmodifies it when you do a select, again you're going to get that updated setting. So you get themost updated data that's in the database when you do a query. But that's what a non repeatableread is. But you can change the isolation level in PostgreSQL to be a repeatable read. And thiscan be on a per transaction level or for your particular session that you're connecting to thedatabase as. And with a repeatable read, you actually get rid of non repeatable reads andphantom reads. Although it's still possible to get a serialization anomaly where the order of thetransactions, there's no guarantee as to when that can happen. And here he talks about beingable to set the transaction isolation level as a part of a transaction and he goes over discussingthis in more detail, read committed along with some examples, repeatable read and then finallyserializable. So again, this is a great post and discusses how this kind of system works and howusing these transaction isolation levels you can alter the concurrency of PostgreSQL dependingupon your use case. So particularly if you're a developer, I encourage you to check out this post.

[00:11:33] The next article is what is Autovacium doing to my temporary tables? And this is fromCyberTech postgresql.com and he's talking about what Autovacium does. This essentially cleanstables, like what was mentioned in this previous post is that essentially to handle MVCC, whatPostgreSQL does is it doesn't do updates in place when there's an update. What it does is itactually creates a new row and then flags the old row essentially for deletion, but it can still bevisible based upon the transaction isolation level. What Auto vacuum does is it goes into thetable and then clears out all the old rows, essentially finally deletes them as opposed to justbeing marked for deletion, as well as reset the XID that was mentioned in this post as well. SoAuto Vacuum goes on a regular basis and essentially vacuums all these dead tuples from thetables. But an issue is it actually does not vacuum temporary tables. Temporary tables existwithin the essentially connection that you're using and they get discarded once that connectionis stopped or that session has stopped. And he has an example here where he's actually usingan extension to be able to get statistics with regard to the tuple. He creates a real table called TReal and then creates a temporary table called Ttemp. He inserts 5 million rows. He deletes halfof those rows from each of the tables, waits a bit for auto vacuum to clean up the real table, andthen when he checks the stats of the tuple for the real table, you can see hardly any deadtuples, and a fair amount of free percentage of space has essentially been reclaimed by doingthe vacuum operation. Whereas when he looks at the temporary table, you can see tons of deadtuples. So Auto Vacuum has not touched this table and you of course, don't have that much freespace. So this can be an issue if you're trying to keep temporary tables around a long timebecause they're not going to be auto vacuumed. He says you can manually vacuum, so thatworks, but definitely, again, don't have like long running transactions with temporary tables thatlast a while because they're not going to be vacuumed. If you do have a temporary table, theyget so large it's going to be around a while or potentially run into a transaction ID wraparound,depending on how much data you're working with. Sounds like you may need to manuallyvacuum it or drop those temp tables when you can and recreate them again. So just somethingto keep in mind. And another great b --log post from CyberTech Postgresql.com.

[00:14:04] The last post is newbie to PostgreSQL. Where to start? And this is from the Heigowebsite and it's basically listing resources that they suggest to their developers or people gettingto start to use PostgreSQL for the first time. And the number one link he mentions here isPostgresqltutorial.com, as well as Momgm US's website at his extended presentations and ofcourse, the official PostgreSQL documentation. So if you are at the Getting Started stage, it'sdefinitely a blog post to check out.

[00:14:37] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --

episode_image