background

28 Tips, Lucky 13, Autovacuum Tuning, Logical Pitfalls | Scaling Postgres 129

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss 28 tips & tricks, Postgres 13 is lucky, autovacuum tuning and logical replication pitfalls.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about 28 tips, lucky 13, auto, vacuumtuning and logical pitfalls. I'm Kristen Jameson, and this is scaling postgres episode 129 one.

[00:00:22] All right, I hope you, your family and co workers continue to do well. Our first piece ofcontent is Postgres tips and tricks. This is from PG IO and this is a set of about 28 different tipsand tricks with postgres from things you can do with SQL commands to other types ofconfiguration. So I'll just run through a few of these just so you get a sense of it. But there are 28of them, so I'm not going to cover everything. The first is you can insert multiple rows in onestatement by doing a multiro insert. Definitely a better thing to do for performance. It allows youto insert data much faster, insert a row and return automatically assigned values. So this is thereturning statement. Further down below you can see how you can use it with delete statementsor update statements as well to return data from changed data, how to auto generate UUIDprimary keys. And there are basically 28 other tips you can use for working with postgres. So Ihighly encourage you to check out this blog post to see if there's a tip that could maybe help youin your work.

[00:01:26] The next piece of content is why PostgreSQL 13 is a lucky release. This is fromCrunchydata.com and he's highlighting a few features that are coming with version 13 this fall.The first is why are my indexes smaller? Basically, with the deduplication that is in version 13,you can get a lot of space savings with Btree indexes. So he generates an index here and givesyou an example where postgres twelve had a 28 megabyte index, whereas postgres 13 had a9.5 megabyte index. In addition, query times were two x faster on postgres 13. So definitely abenefit. But of course, once you upgrade to 13, you're going to need to reindex probablyconcurrently your indexes to get this deduplication and presumably some of these performancebenefits. But as he mentions here, your mileage may vary based upon your performance gains.The next thing is speeding up vacuum or auto vacuum, because now you can do indexes inparallel, which is a huge advantage because that takes the longest amount of time for vacuum torun, in my experience, is going through all of the indexes. The next is something I wasn't asfamiliar with, but it says sort less, sort quicker that they are offering an incremental sorting. Andit appears like when you're sorting by more than one column, it uses some efficiencies. As longas there's an index. I believe on this column x the first column you order by to give you fasterquery execution. So it looks on here. As in version twelve, this particular query ran in 39milliseconds, where in version 13 it ran in less than one millisecond, zero 82 milliseconds. Sothat's a huge performance change. Again, your mileage may vary, but if this is possible, that'sdefinitely an interesting addition. Then of course, he mentioned some of the other enhancementswhereby more create statistics options are possible, different partitioning benefits in terms of fullsupport for logical replication and before triggers hash aggregation used with aggregatefunctions and grouping sets. The Lib PQ drivers now supporting Scram authentication withchannel binding a date time function in the JSON path query language and generate a randomUID can now be used without an extension. That's a good one. And postgres Farn data wrappernow supports certificate authentication. So a lot of big improvements coming for 13 that shouldbe out sometime this fall. And if you're interested in learning more about it, you can check outthis blog post.

[00:03:53] Next piece of content is tuning PostgreSQL Auto Vacuum. So first they cover whatare the tasks of Auto Vacuum and they mention cleanup dead tuples left behind after update ordelete operations. Update the free space map that keeps track of free space and table blocks.Update the visibility map that's required for index only scans. That helps the system determine ifit needs to actually go look at the heap or it can just stay on the index freeze table rows so thatthe transaction ID counter can safely wrap around. And then also do an analyze so that yourstatistics are kept up to date. So the first thing that they mentioned is make sure that nothingkeeps Auto Vacuum from reclaiming dead tuples. So basically make sure autovacuum isrunning. The main t --hings that stop it from running are long running transactions. So basically do what you can toavoid those. And they mentioned two configuration changes that you can make is the idle intransaction session timeout. You can set that to a value such that if you have transactions thatare idle, they will be canceled after a period of time. Also the general statement timeout. So anylong running statements will automatically be canceled. The next step is tuning Auto Vacuum torun faster. What I consider the primary method of doing that is the Auto Vacuum vacuum costlimit here. So you basically put this somewhere up in the thousands to make sure more of thesystem is dedicated to doing vacuums, as long as it doesn't impact your query performance, ofcourse. And then auto vacuum vacuum cost delay. In newer versions it's at two milliseconds, butyou can of course adjust that. In older versions it was 20. So the closer you get to zero, thefaster you'll auto Vacuum tables. And they do mention a number of times in this post that youcan set these settings not just at the cluster level, but also at the table level. So here they showan example of a busy table where they're changing the vacuum cost delay to one millisecond.Now, I've heard opinions on both sides. One opinion is that you shouldn't really do table optionsbecause it starts making things more difficult to manage and just stick with the cluster leveloptions. Whereas others say if you have particular tables that are exceptions, go ahead andchange the parameters for those tables. So I'll leave it up to you to decide which you think isbest. The next area is change the workload so that fewer dead tuples are generated. Sobasically, can you make changes to your application where potentially you're updating or deletingrows less? So for example, in terms of deletion, could you maybe partition and then just truncateor drop those tables? Or if you're doing a bunch of updates, could you batch those updates oralter your application so that you're maybe doing more inserts and less updates? They alsomentioned here that you could reduce the number of dead tuples by using hot or heat only tupleupdates. So you could set a fill factor to something less than 100 and then make sure whenyou're doing an update that the column you're updating is not indexed so it doesn't have to go tothe index to do an update. In terms of tuning auto vacuum for index only scans, again, this dealswith a visibility map. Basically he suggests setting your auto vacuum scale factor relatively lowso that it gets vacuumed frequently. And then for inserts in version 13, there's the new Autovacuum vacuum insert scale factor that will help that table be picked up to be vacuumed morefrequently so you get the visibility map updated more frequently. This is for the case where youhave predominantly append only tables. There's not a lot of updates and deletes happening nowon lower versions. He says you can modify the freeze max age so that again you get it tovacuum more frequently. In terms of tuning auto vacuum to avoid transaction wraparoundproblems, he mentions making sure that the anti wraparound vacuum can freeze tuples in alltables. And again, we're going back to those very long transactions. You want to minimize thoseand do the suggestions that were mentioned previously and then to address it for updates,deletes and inserts. You just want to make sure Autovacium is running. There's no long runningtransactions or things of that nature that's blocking auto vacuum from doing its job. Now theyalso mentioned partitioning and he says if you are partitioning with a lot of tables, then youprobably want to consider having more auto vacuum workers because more tables, then youcan work on them more frequently with more workers. Although keep in mind the cost limit,which was the first thing mentioned here, covers all workers. So if you have three workers, theycan do more work per worker. But if you go to six, this limit is for all six, or if you go to nine, thislimit is for all nine. So the more workers you have, the less work that can be done by each one,but if you're doing partitioning with a lot of tables, that's okay to do because there's less work tobe done in each partition. And lastly, they cover tuning autoanalyze and some configurationchanges you can make to have it analyze your tables more frequently. So overall, if you'relooking to do some tuning to PostgreSQL's auto Vacuum, defi --nitely check out this blog post from Cybertechn Postgresql.com.

[00:08:43] The next piece of content is Pitfalls and Quirks of Logical Replication in PostgresTwelve. This is from Elephanttamer Net. Now, he's talking about logical replication. He doesn'tmention the particular version, but he talks about some quirks you need to be aware of. The firstthing he talks about is that disk space considerations because if you have logical replicationfailing for some reason, you could potentially run out of disk space on the primary becauselogical replication uses replication slots. Now, you can also run into this if you're using replicationslots for physical replication or wall based replication, but you pretty much have to use slots withlogical replication. Therefore this disk usage risk exists. So you need to have monitoring in placeto handle that and identify if you're getting too much wall due to, say, an orphan slot beingpresent. The next thing that they mentioned is that basically you need to have a replication slotfirst before you create your subscription. Otherwise, the create subscription statement justhangs. So that slot needs to be present. Next thing he mentions is be careful with publication forall tables because if you do this, it doesn't really address tables that are newly added. In fact, hesays, quote, it will cause the replication to stop as soon as you issue a Create Table statement.In addition, it'll automatically include tables created by extensions such as the spatial refsystable he mentions for PostGIS and also for tables that don't have a primary key or Replicaidentity, which poses a problem for logical replication. So you want to be very cautious, he says,of these for All Tables, and perhaps just do individual tables. The next thing, consider the tablesthat you're replicating, because in the case of the spatial sys table, this is actually supposed tobe independent per PostgreSQL instance, even if you're replicating over. So you're probably notwanting to replicate this particular table. Next is to review your primary keys carefully and makesure each table that you're Replicating has a primary key or a Replica identity. And the lastrecommendation is after adding a new table, be sure to refresh your publication. So for example,you do an Alter publication, add a table on the primary server and then you Alter subscriptionrefresh publication on your Replica. So definitely some issues to be aware of if you are usinglogical replication. If you want more details about this, check out this post.

[00:11:03] The next piece of content is actually a YouTube video and it's webinar businessintelligence with window functions in PostgreSQL by Gianni Jolie. This was actually done as awebinar in August 2019 and covers PostgreSQL Eleven. But I'm highlighting this becauseanything with window functions, I always like to remind myself of their features because I don'tuse them that frequently. So if you want to learn more about window functions, this YouTubevideo was just posted from the webinar that was done in 2019.

[00:11:32] The next piece of content is authentication in Pgpool Two. This is from Bpingblogspot.com. They're talking about how you set up Pgpool Two in terms of authentication, interms of how it works. Basically we have a client that first authenticates to Pgpool Two and thenPgpool Two authenticates to the PostgreSQL server. And they have all these differentauthentication methods that are supported from trust MD five Scram, Shot 256, certificate, Pamand LDAP. And some of these are version specific. Of course for PG Pool Two, then it goes intosome of the different settings you can make and configuration. And of course they definitelysuggest using either some sort of directory based authentication or using the Scram for securityreasons. So if you're interested in understanding how PG Pool Two handles authentication,definitely check out this blog post.

[00:12:24] The next piece of content is introducing the Postgres Prometheus adapter. This isfrom Crunchydata.com. Prometheus is a systems and service monitoring system, so it tracksmetrics for different servers, for different metrics you want to measure. Now it has a built indatabase it already uses, but this blog post describes an adapter that allows you to send data toPostgres as opposed to the default database. So it has the code of how you get the adapter andhow you can set it up and configure it to be able to use P --rometheus to send all of your data to Postgres. And it looks like it sends it to partition tables.So if you use Prometheus and want to try to send the data to Postgres as opposed to its defaultdatabase, definitely check out this blog post from Crunchydata.com.

[00:13:11] The next piece of content is who is spending wall crazily. This is from Heigo CA, andthey're basically talking about trying to identify what is causing increase in the amount or size ofwall records. And basically this post is a tool kind of describing PG wall dump. So this helps youlook at each of the different types of records that exist in a set of wall files and tells you therecord size and the percentage of each type. So this blog post goes over how you can use thecommand PG Walldump, some different configurations that they've set and also describes someof the different column output and the types of records that exist. Now, they don't covereverything, but they give a basic overview. So if you're interested in using PG wall dump toanalyze your wall files, definitely check out this blog post.

[00:14:00] The next piece of content is Advanced Partition matching for Partitionwise join. This isfrom Secondquader.com, and this is a feature that's coming in postgres 13 where they madesome enhancements to partitionwise joins. So if you want to learn more about theenhancements that are coming to version 13, definitely check out this blog post.

[00:14:22] Next piece of content is waiting for PostgreSQL 14 PG Stat Statements track numberof rows processed by some utility commands and basically for statements tracked in PGSTATstatements, it retains how many rows were processed. So this is now an output option. So youcan see how many rows are returned by given statements that are tracked. So if you want tolearn more about that, check out this blog post from Depc.com.

[00:14:48] Next piece of content is preventing SQL injection attacks in postgres. This is fromCrunchydata.com. Now, this isn't necessarily postgres specific, they do mention a fewconfiguration changes you can make to help audit some of the statements coming through here,like Log statement, Logman Error Statement. But a lot of the advice I give is for any databasesystem and a lot of application frameworks kind of do all of these things already, but it was still agood post to give you an overview of SQL injection attacks. So if you're interested in learningmore, definitely check out this blog post.

[00:15:22] And the last piece of content is the PostgreSQL person of the Week is Renee Phillips.So if you're interested in learning more about Renee and her contributions to PostgreSQL,definitely check out this blog post that does it. For this episode of Scaling Postgres, you can getlinks to all the content mentioned in the show Notes. Be sure to head over toScalingpostgres.com, where you can sign up to receive weekly notifications of each episode, oryou can subscribe via YouTube or itunes. Thanks. --

episode_image