background

State of Postgres, Dumping DBs, Configuration Parameters, Partial WAL | Scaling Postgres 94

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the State of Postgres, how best to dump your database, setting configuration parameters and managing partial WAL files.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about the state of postgres dumping,DBS, configuration parameters and partial write ahead log. I'm Kristen Jameson. And this isscaling postgres episode 94 one.

[00:00:21] Alright, I hope you're having a great week. Our first piece content is actually calledThe State of Postgres. And this is@stateofpostgres.com and this is a survey that they'vepublished along with the results from 500 members of the Postgres community. And this wasdone by Timescale DB and they have a very interactive site that goes over some of the resultsand some conclusions from that survey. Now personally, I kind of like at the top link here, the fullresults to be able to see where different results came from. And this is from about 500 people.So not a huge number of people using Postgres, but definitely a representative sample, butnevertheless interesting pieces of information. It's like for example, it seems like predominantlysoftware developers, engineers responded to it and the vast majority are using it for work, doingsoftware development and of course, whole other set of questions that were asked. So if you'reinterested in this type of content, definitely a link to check out. The second piece of contentrelated to that is five quick thoughts on the State of the Postgres survey. And this is from ExillaNet at the Azilla blog and he mentions five points from his thoughts. And the first thing sectionhe mentions here is that 500 people, which is a pretty small subset of everyone in the postgrescommunity, so that's something to keep in mind. He appreciates the slow and steady growth. It'snot that suddenly people in the last few years have started using Postgres. It's had a longruntime which hopefully demonstrates that it will be relatively stable as opposed to somethingthat's Flash in the pan talked about. You do what now? And he's basically talking about it'spredominantly software developers that were filling out the survey using Postgres andpredominantly with regard to software talking about the clouds, I didn't necessarily agree withsome of the things mentioned here as I did feel the survey gave some pretty clear indicationsabout how they're running Postgres. And he had some opinions about particular quotes thatwere picked out. But I thought this was an additional piece of content to complement the survey.So if you're checking that out, maybe you would like to check out this as well.

[00:02:37] The next piece of content, how to effectively dump PostgreSQL databases. Now ofcourse there's the tool PG dump all, which dumps the entire database, cluster every databasewith it along with the global things such as roles, table spaces, users, et cetera. And then PGdump just dumps a single database. All of those globals that are global for the entire cluster arenot dumped. And basically he discusses kind of the advantages and disadvantages of each one.And basically he prefers doing using PG dump and not PG dump all. PG dump all is veryconvenient to get started with. But really as you increase your sophistication you're probablygoing to want to switch to using PG dumps. So using PG dump all has a number ofdisadvantages. You can't easily restore, say, a single table. You can't easily do dumps orrestores in parallel. So what he advocates is dumping the globals and then dumping eachdatabase individually. And he talks about the different formats that you can do with pgdump, thatyou can't do with PG dump all. PG dump all just dumps out to the plain format, whereas differentformats, such as the directory format, they allow you to do dumps in parallel, do restores inparallel, and you can use the J option to specify how many jobs to do that in. And you can evendo partial restore. So you just want to restore this particular table, for example. And he has anexample script here of doing this backup process. Basically, he chooses to dump the roles intoone dump, dump the table spaces into another dump, and then go over per database dumps,doing PG dumps for each one. So, very good blog post. That explains the rationale for using PGdumps versus PG dump all. And if you're using logical backups as your primary way to dopostgres backups, I would definitely advocate the solution that he has outlined here. And Ishould mention this is from Dep.

[00:04:41] The next post is setting PostgreSQL configuration parameters, and this is fromCyberTech Postgresql.com. Now, the first wa --y to set the configuration, of course, is the postgresql.com file. So that has all the differentconfiguration parameters that you can just go in and edit, although it does support havingincluded files. So there's usually a separate directory called conf D that you can put files into thatwill be included as part of the postgresql.com. Now, I tend to use ansible configurationmanagement to do my PostgreSQL configurations, and you can do a copy of this, but I prefer touse included files. So just say this is the configuration that I'm changing and basically leave thepostgresql.com file unchanged. Now, a thing to keep in mind is that the last parameter set wins.So if you've defined, say, shared buffers in two different places, once in postgresql.com andonce in an included file, the last one to be read will be set. So generally the included files willoverride what's set in the postgresql.com file. The next area that you can change to getconfiguration changes is running the Altersystem command. Now, what the Altersystemcommand does is it modifies a PostgreSQL auto.com file. So this is a file that specifically saysdo not edit this file manually. It will be overwritten by the Altersystem command. So basically,these changes take precedence over what's set in the postgresql.com file. So any Alter systemcommands are going to make changes to this file to persist over reboots. And as it said, what Imentioned here these values will have precedence over the postgresql.com file. The other thingthat they mentioned is that you can set built in settings. So for example, when you startpostgres, like for example using the Pgctl command, you can actually pass in parameters at thattime that you're starting it up so that database cluster will have that parameter when you'vestarted it. The next area that you can set configuration for is at the database level. So you cando an altered database command to set certain parameters. Not all parameters can be set.There are certain ones that cover the entire cluster that of course you cannot set using analtered database statement. But that's another way that you can set certain parameters. You canalso alter the user and set parameters that are impacting a single user. You can also alter user ina particular database. So again, more granularity to be able to set it. And then lastly, of course,is setting at the session level so the connection, you've made a connection to a PostgreSQLserver and within that session you can set certain parameters that will be persisted during thatsession and then they even go into the concept of assigning parameters to functions. Sodefinitely an interesting blog post and if you're wanting to know where different configurationsettings can be set for PostgreSQL, this is a very good post that reviews all of those differentcapabilities.

[00:07:45] The next post is Barman 210 recovery of partial wall files. Now this is generally anannouncement of the availability of Barman 210 but it also discusses partial wall files and howthey've made it a lot easier to handle those. Now I bring this up because it's another post thatalso educates along with doing an announcement. So for example, normally when you haveArchive mode enabled in PostgreSQL, you're going to enable an archive command that's goingto save those wall files to a particular location. And generally I always have Archive commandrunning, but there's also the ability to stream the wall files to another system that's running thePC receive wall process. So essentially doing physical replication of the wall to a separatesystem or process. And Barman actually supports both methods. Archive command can copy itto a particular directory defined by Barman as well as use a PC receive wall process to streamthe wall changes. Now the advantage of PC receive wall is that with the Archive command itwaits for an entire wall file to be filled up before archiving it. So it must have a full wall file beforethe archive command is run, whereas PC receive wall constantly streams the changes that havebeen written to the wall file. So in that case, when it's not a full file, you're going to get a file witha partial suffix. Now the advantage of this is there's actually more data that you will have if you'reusing PCPG receive wall versus just relying on the archive command. So potentially you will loseless data as they're talking about here. Your recovery point objective will be much closer to zero,and i --t may be zero if you're using synchronous wall streaming. And they say that basically Barmansupports this fulfilling the RPO of zero, but this two point ten point release actually makes it a lotmore seamless to be able to handle the recovery process by automatically handling these partialsuffix wall files. So if you use Barman, definitely a blog post to check out for the update, as wellas also get some good information about the differences between using an archive commandand using PG Receivewall. And this is from the second quadrant.com blog.

[00:10:10] The next post is how to perform a major version upgrade using PG Upgrade inPostgreSQL. And this is from Crunchydata.com, and this is a relatively short post, but it goesthrough the process of how you could do a PG upgrade, listing the different commands here.Now, you'll probably find a much more comprehensive documentation for doing an upgrade, butthis is a simple post that actually just goes over the steps you would need to go through. There'spoints of validation you'll want to do and things of that nature, but this gives you a high leveloverview of how to do an actual PG upgrade. So definitely a blog post to check out if you'reinterested in that. The next post is how to run postgres on Docker part Three. So this discussesthe process of setting up a primary and a replica as docker containers and have them sync databetween each other. Now, I didn't actually see where part two and part one were, however, thisshows you how to set up a cluster of PostgreSQL instances. So if you're interested in doing that,definitely a blog post to check out.

[00:11:15] The next post is Power for New Power Collector Daemon, and this discusses theperformance monitor Power for Postgres, and it's in a beta state. And this discusses theimplementation of the collector. So if you use Power or you're considering it, definitely a blogpost to check out. And the last post discusses a new connection pooler. If you're unfamiliar withit, they're classifying it it's called Odyssey and they're classifying it as a scalable PostgreSQLconnection pooler because they indicate it's using a multi threaded process. And this is byYandex, so it is apparently driving a fair amount of traffic. Now, I tend to use PG Bouncer, but ifyou're interested in looking at connection pooler alternatives, maybe you would want to checkout the Odyssey project here.

[00:12:06] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --

episode_image