background

Ballad of Bloat, Auditing Options, Configuration Options, Need Vacuum? | Scaling Postgres 99

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the "Ballad of Bloat", options for auditing, a configuration help resource and determining when a vacuum is needed.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about the ballot of Bloat auditingoptions, configuration options and need vacuum. I'm Kristen Jameson and this is ScalingPostgres, episode 99 one.

[00:00:21] All right, I hope you're having a great week. So next week will be episode 100. So stilltrying to think through perhaps what I could do a little bit different in that episode. If you haveany suggestions, please leave them in the comments below or send me an email with regards tothat. Now, in terms of this week, we have a lot of posts that were done, a lot of content to getthrough, so we'll go ahead and get started. The first piece of content is just something fun andinteresting. It's actually a song that someone put together and it's on YouTube, so it's a YouTubeand it's Nothing Compares to Vacuum the Ballad of Bloat. And this is in the Warsaw PostgreSQLUsers Group. And it's a song, basically, using the melody from Nothing Compares to you bySinead O'Connor. So quite interesting. I definitely encourage you to check it out. The next pieceof content is Row change Auditing options for PostgreSQL. And this is from CyberTechpostgresql.com. So if you're wanting to audit row changes, so inserts updates, deletes ortruncates, they go through a couple of different options that you can do it. The first way is tochange your log statement in the postgresql.com file so you can make it none. So don't loganything, log just DDL statements where you're creating manipulating objects, do mod, so likemodification or all, but it has a number of downsides and they go through the different pros andthe cons here. The pros are this is super easy to set up, it's one configuration variable to tuneand it starts giving you some auditing. The cons are it catches only the top level DML statementissued by the user. So like stored procedures, things of that nature can bypass this. No rowcontacts for multiro updates, no table or Schema based filtering information is purely text based.Queries from failed transactions are also included. The next step up they mentioned is the PGaudit extension. So the pros of this are you have quite granular logging and auditing options andinternally dynamically generated SQL can also be logged. The cons are it is a third partyextension, possibly heavy disk I O footprint, no row context for multi row updates. Theinformation is purely text based and queries from failed transactions are also included. The nextoption is to build your own custom audit tables and triggers for those tables to populate. So thisgives you the most flexibility of kind of what you want to audit, but it's probably the most workand they go through the different pros and cons for that. Now, the next option is interesting, I'veused it myself, is using one generic audit trigger in table for all audited tables. So you have just ageneric log table. You create indexes on it and you're basically using a JSON B field to store therow data and the changes that were done. And he creates different indexes necessary to dothat. And then the function defines what and how to store the rows based upon changes. Andthen you put this trigger after insert, update or delete on each table that you want to be able toaudit. Now, the pros of this is that there's less code to manage and automatic attachment ofaudit trail triggers can be easily configured for new tables. Now the cons are it is a bit moreresource intensive to do this method. It says some exotic indexing gen may be needed. I don'tnecessarily know if that's too much of a downside and SQL searching may become a bit morecomplex. Again, work with JSON B fields it takes a little bit more work, but it's not so bad. So Ipersonally kind of like this approach. The next approach is doing logical replication and there'stwo different ways. One way is PostgreSQL native, so basically you set up a separate databaseserver and you basically create a publication on the primary and then create a subscription onwhat you're replicating to to audit the changes that are done. The other way is logical replicationwith custom applications. So you develop an application that supports logical decoding and youessentially read the wall stream and log whatever you want to from that log stream and theycome up with a good summary table here that discusses each of the options. So, definitely agreat blog post to consider if you're wanting to implement auditing or perhaps change how you'reaud --iting your current application.

[00:04:51] The next post is comp PostgreSQL configuration for Humans. So this is a blog poston Postgres.com that is talking about PostgreSQL conf where it's PostgreSQL NF that they aredefining as your PostgreSQL conf documentation and ultimate recommendations source. Sobasically all the different configuration parameters, they have documentation that does mimicsome of what's in the actual PostgreSQL documentation but also gives some recommendations.So it's essentially another source and the link that they're referring to is this one here. So it hasall the different configurations options here you can just click one or you could just do a searchand it comes from finding what are the best configuration options to change out of the 300 or sothat are available. And so for example, if you click Auto Vacuum it gives a discussion of what itis, some recommendations, they even offer the ability for comments to be added and it givesgeneral parameter info as well as looking at it by version. So if you want additional resource tohelp you configure your PostgreSQL configuration, definitely check out this resource. The nextpost is the most neglected postgres feature and this is from Richard Yen@richien.com andbasically he's talking about a log line prefix and so many times he's seen an empty log line prefixfor older installations of postgres, or they just have a simple timestamp or something similar thatis not that descriptive. Now, more recent versions have a more comprehensive one, but if you'veupgraded and haven't updated it, this is definitely something to consider doing because you candefine some of these ones that he's defining as his favorite prefixes. To add one is theapplication name, the username, the database name, particularly if you have multiple databaseson one database, cluster, the remote host IP name, the process ID session process log line, aswell as transaction IDs. So you may want to consider reviewing this post if you want to add alittle bit more descriptive information to your logs. And then at the end here it says that's not all.And he actually covers some other log statements that are fairly important that he feels youshould set. This is one of my favorites is the Long men duration statement to help find slowqueries, log statement, log connections, disconnections log rotation, age rotation size log, autovacuum, men duration log checkpoints, log temp files, and auto explain. So if you want to updatehow your logging is set up for PostgreSQL, definitely a blog post to check out.

[00:07:32] The next post is which tables should be auto Vacuumed or Autoanalyzed. And this isfrom Dep.

[00:07:39] And basically he had an issue where Auto vacuum wasn't keeping up with changesthat were being made to the database. So he needed to do a manual vacuum to catcheverything up. But he said he didn't have a great way to check what tables were waiting for autovacuum. So he actually developed a script that does this or a query that does this. So heconsults the different auto vacuum settings to be able to interpret what is the next table thatwould be up for an auto vacuum. And he developed this query that you can run in your owndatabase to be able to do exactly that. And he created it as a view here that can be used in thisway. So you can know when the next thing should be vacuumed or when the next thing shouldbe analyzed. So if you think you could benefit from this, definitely check out this blog post.

[00:08:31] The next blog post also from Dep is don't do these things in PostgreSQL. Now this isfrom a previous Wiki page that was set up on what not to do. And we had talked about this lastyear when it was posted, or excuse me, a year before last. And I remember saying some ofthese were kind of opinionated in that I might not do the same thing. But it's interesting herewhat he did is he took all of these don't do things and then he set up a SQL script that for thoseyou could look in the database to see how it's said and what's being done. It can actually look forviolations of these Don't Do rules. So if you're interested in that, check out the PG wiki don't inorder to get access to this SQL script to run on your own database.

[00:09:21] The next post is Pgctl Tips and Tricks. This is from PG IO and they're talking aboutPgctl or PG essentially Control for controlling the setup in your general database cluster. So theycover this command where it typicall --y is located on Debian, Ubuntu and Red Hat as well as CentOS, and they go over the processof creating the database cluster, how to start a database cluster, and different ports and optionsyou can setting at the point at which you start it. So if you're interested in learning more aboutPgctl, definitely check out this blog post.

[00:10:00] The next post is debian integration of Petrone and VIP Manager. So they've set up inDebian, the VIP manager. So this is a virtual IP manager to help you manage a cluster ofPostgreSQL servers, one acting as primary and one or more Replicas. And we're working withPetrone. This virtual IP manager essentially helps direct traffic to the active primary or directwrite traffic to the active primary. So if you're interested in Petrone and the VIP Manager formanaging a set of PostgreSQL servers on Debian or perhaps Ubuntu, definitely check out thisblog post.

[00:10:41] The next post is scheduled. PostgreSQL backups and retention policies withKubernetes. And this is from Crunchydata.com. And they have developed their PostgreSQLoperator that helps manage postgres instances within Kubernetes. And they've actually addedthe ability to use PG backrest as a backup and schedule retention policies with it all withinKubernetes. So if you're interested in using that, definitely check out this blog post.

[00:11:12] Next blog post. Also from Crunchydata.com is migrating from oracle to PostgreSQLtips and tricks. So they talk about different solutions to issues you may run into if you'remigrating from Oracle to postgres. They mention Aura FCE, which includes some functions thatyou're familiar with using in Oracle. Well, they give you postgres versions that work similarly toOracle, and they tell you how to go ahead and install it and create this extension to be able to dothat. They talk about how to handle if in Oracle you do things like disable and enable constraintsor disable not null constraints. They discuss how to handle that in postgres, how the grantcommand is similar but yet a little bit different. They talk about how to best drop objects inPostgreSQL versus Oracle, how to check for not null, and discussing row ID, Ctid and Identitycolumns. So if you are migrating from Oracle to PostgreSQL, definitely a blog post to check out.

[00:12:12] The next post is Barman Cloud Part One Wall Archive. And this is fromsecondquadrant.com. And as of barman two point ten, there's a way to do streaming backup ofyour wall files to a cloud location. So for example, Amazon s three, and that's what this postdiscusses Barman cloud wall archive. So essentially the archive command you can configure inPostgres. There's a Barman application, essentially you can send it to that will send the wall filesto S Three, and the part two will cover Barman cloud backup. So if you use Barman and want tostart using this archive directly to a cloud location such as Amazon s three, definitely check outthis blog post.

[00:12:55] The next post is PG. Timetable? Advanced PostgreSQL scheduling. And this is fromCyberTech postgresql.com. Now, it's interesting they call it PG Timetable because I actuallydon't see that. This is an extension for Postgres. It's a separate application written in Go, and it'stypically delivered as a container, it looks like. So I guess I'm used to seeing PG underscoresomething related to an extension for Postgres. But this is actually a separate application thatdoes scheduling of tasks. It uses PostgreSQL essentially as the data source, but by having itbeing its own application, you can do all sorts of things outside of Postgres, like send emails,download files, et cetera. But it looks like when you're configuring it, you're doing all theconfiguration in SQL. So it's definitely an interesting project. And it looks like they have a Cronstyle you can implement as well. I'm going to be sticking with Cron myself, I think, but if you'reinterested or looking for another scheduling option, maybe check out this blog post. The nextpost is understanding prepared transactions and handling the orphans. And this is from Higo CA.And prepared transactions generally have to do with a two phase commit process, so younormally don't use this in normal application development. So I haven't really consulted this. Butif you are considering distributed transaction management in things like two phase commits,maybe you want to check out this blog post.

[00:14:30] And the last post is developi --ng. PostgreSQL for Windows part one. And this is from secondquadrant.com. So if you aredoing development of PostgreSQL and want to do one on Windows, here's a blog post that givesyou some tips and tricks to help you do that.

[00:14:47] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the Show Notes. Be sure to head over to Scalingposgres.com, where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes.Thanks. --

episode_image