background

Prewarming, Nondeterministic Collations, Generated Column Performance, Foreign Keys | Scaling Postgres 86

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss prewarming your cache, working with nondeterministic collations, generated column performance and foreign keys with partitions.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about prewarming nondeterministiccollations generated column performance in Foreign Keys. I'm Kristen Jameson, and this isscaling postgres episode 86.

[00:00:21] Alright, I hope you're having a great week. A first piece content is Pre warmingPostgreSQL I O caches and this is from CyberTech Postgresql.com and it's actually talkingabout an extension called PG Prewarm that lets you essentially warm up your shared buffersbecause normally when you restart the system you're going to lose some of that in memorycache. Now some of it, because it does utilize some of the OS caching may still be present, butit should help you preserve things like shared buffers. And using this extension, he says there'stwo ways you can do it. You can do manual caching or you can do automatic caching on startup,which is probably what most people would want, and to manually cache something after youhave created the extension in your database, which in this extension is part of the contribmodule. So install that first. You just do a select star from Pre Warm and then specify the tablename. So it's a function called PG Prewarm and the result says 4425 pages have been read andput into the cache is what gets returned as the results. Now they have some other options hereyou can use like specifying the first block, last block, things of that nature, but then he goes intoautomatically populating your PostgreSQL cache and basically once you install the extensionyou then add PG underscore Preworm to your share preload libraries in your postgresql.com fileand do a restart. And then what you will see is an auto pre Warm master process that is startedin addition to the other standard processes. And as he says here, PG Preworm will store a list ofblocks which are currently in memory on disk and after a crash or restart, PG Prewarm willautomatically restore the cache as it was when the file was last exported. So if you have a largerdatabase system and you find it takes time after a restart for the caches to warm up, this isdefinitely an extension to check out and try using.

[00:02:28] The next post is nondeterministic correlations. And this is from PostgreSQL Verite Proand apparently started since version twelve. PostgreSQL correlations are created with aparameter called deterministic. So basically doing a deterministic comparison it basicallycompares it essentially at the binary level to make sure that things are comparable. But you canalso of course set this to false. So essentially you can get nondeterministic correlations and whatthat enables you to do is potentially do case insensitive searching or removing or searching andordering without considering accents that are a part of words. And he has a list here of what hecalls a fancy comparison features that are enabled by nondeterministic correlations. So this issomething you couldn't do before, but now you can in version twelve. So you can determineequality between canonically equivalent sequences of code points. So they're doing acomparison here and when deterministic is false, you can get a truthful comparison betweenthese values. So from a binary sense they are different, but from an actual sense, just looking atthem they are identical. So you'll get that with the nondeterministic comparison. The next one isequality between compatible sequences of code points. So I believe this is considered withregard to ordering. So comparing two F's with a combined F, so two Latin F letters, you can getan equal comparison on that if you create a correlation that has a secondary strength. So bydefault when you do deterministic false, it still won't consider these true. You actually have to setthe strength at a secondary level to be able to allow these comparisons to equate to true. Thenthey go on to ignoring case. And again, you can ignore case as long as the collation is set to asecondary strength. You can ignore case and accents if you set the strength to a primary. So at aprimary strength you can ignore capitalization and accents. You can ignore accents but not thecase. Using some more keywords here, saying the strength is primary but the case level is yes.You can ignore spacing and punctuation by setting different values here, alternate equals,shifted, matching compatible symbols and ignoring code points assigned to individualcharacters. So there's a lot of different capabilities with this support for nondetermin --istic correlations. I'm not 100% up on all this, but very interesting possibilities in terms of doingordering and comparison operators when working with text. So if you do a lot of that, definitelycheck out this blog post to learn more about this new feature and potentially how it can be used.

[00:05:29] The next post is called Regenerated and it's basically about generated columns. Now,we've seen some similar posts about generated columns in previous episodes of ScalingPostgres. What is particularly interesting about this one is that they do a comparison of triggerrelated and then using the new generated column features. Because if you wanted to achievewhat generated columns do, basically generate a column based upon existing data within thattable row, for example, like adding two values together. But what he does is he does a triggerapproach because you have to use triggers because this feature does not exist before versiontwelve. So he did a trigger approach and then he used the native generated column approachand then he looked at their performance. So that's what I found interesting here. So hecompares inserts using both methods. Using the trigger method, it completed in 6 seconds, sothe million rows inserted 6 seconds. For the generated columns feature it's at 4 seconds. So it'smore performant, which you would kind of expect because it's a core feature of PostgreSQLnow. But with no generated columns used, the insert performance was about 2.4 seconds. Sogenerated columns still do take a performance hint, just not as significantly as when you're usingtriggers so that's something to keep in mind using this new feature. Then he looks at updates.So an update using the trigger was at 13 seconds, it was at 9 seconds with the generatedcolumn and about 5.8 seconds with no generated columns or triggers. So again you'll see aperformance degradation with update but not as far as using the trigger and then in terms ofdeletes the performance between generated columns and the trigger was actually pretty similarabout four and a half seconds each and less than half that speed when no trigger or generatedcolumns were used. So definitely a great post to cover the generated column feature in generalhow you could use triggers instead if you're using your version prior to twelve as well as theperformance comparisons. And this is from Pgdba.org, the next post PostgreSQL twelve foreignkeys and partition tables. So this is a pretty brief post but it talks about the pretty big benefits forreferential integrity that come with these new features added to twelve to be able to createforeign keys and then reference foreign keys within a partitioned table and they give examplewhere you have an items partition table, a stock partition table and then a warehouses table thatis not partitioned and how in the stock you can actually use foreign keys that references theitems and references the warehouses. So now pretty much any way you can use foreign keys sothis is a great boon to help maintain your referential integrity when you are using partition tables.And this is from secondquadrant.com, the next post also from secondquadrant.com is postgresBDR. It is also about fast safe upgrades. So BDR is their bi directional replication product,basically their master to master PostgreSQL proprietary solution. But I thought this post wasinteresting because it does advocate where BDR has some advantages particularly withupgrades. So now a lot of people use it because they want 24/7 availability and nothing to godown. And when you have a primary and a replica database and you need to switch over there'ssome time that you are actually going to be down because maybe the primary goes down andyou need to wait for your monitoring system to detect that it's down. And then it does someautomated failover method that can take on the order of minutes, whereas they're claiming herethey've gotten it down to a second flipping over with their BDR product. But again, everything'salways in sync. You don't have to wait for something to be synced over from a primary to areplica. Now, things may eventually need to catch up, but I could definitely see this has someadvantages. And particularly they're talking about upgrades and how you can have bi directionalreplication between different versions, which, of course, we've seen some posts do with regardto logical replication, but their product kind of does. It apparently more out of th --e box to be able to keep things up and replicated. So definitely an interesting use case for aBDR that I had not considered before. So if you're interested, definitely a blog post to check out.And another post from secondquadrant.com is managing another PostgreSQL commit, fest. Soif you want some insights into kind of how the PostgreSQL commit fests are run, here's a briefpost that discusses that.

[00:10:31] The next post is Braces are too expensive. And this is from RobertHaas@blogspot.com and basically he's talking about the PostgreSQL executor being volcanostyle. And what does that mean? It means a number of things. The query planner generates aplan that is organized into a sort of tree structure and the system is designed to be extensiblewith individual nodes that appear in the plan tree having no data type specific knowledge, butrather obtaining their knowledge about specific data types from support functions. And it meansthat the executor function is using a pull model. So plan nodes generally support an operationthat says give me the next tuple. So he has an example here where basically the nested loopone is run. And then it says, give me the next tuple from this one. And then this says, okay, giveme the next tuple from this one. Retrieves it, and then says, give me the next tuple for this one,retrieves it. And then this request, this one. So I would say this is kind of like a more top downmodel, but all the interesting work happens essentially on the leaf nodes here, here. And hesays this can create an overhead of a lot of function calls, which is why I believe he's sayingbraces are too expensive. It's not about orthodontics it's about the number of function calls thatare being used. And they had done an overhaul of PostgreSQL expressions because theexpressions were done in a similar tree of executable nodes and they reworked it andtransformed the expression tree into a series of steps that are executed one after another, like asort of dynamically constructed program. And they actually got some increased performance bydoing this for the expression size. So basically they're considering potentially doing this foroptimizing these plan trees. So this is something that they're thinking about doing in terms ofPostgreSQL that may be able to gain some better performance. So if you're interested in digginginto some of this, definitely a blog post to check out.

[00:12:32] The next post is tuning checkpoints and this is from Evolvemonkey Blogspot.com andhe's basically talking about how to set up your configuration for checkpointing in terms ofadjusting checkpoint timeout. Min wall size max wall size checkpoint Completion target givessome advice about first kind of set what you want your checkpoint timeout to be and then kind ofbacktrack from there. As well as doing some queries like this to see how fast you're generatingwall files to get a sense of what you want to set these parameters to. And he gives an examplewhere they helped optimized database that went from this type of very high activity, were able tocalm it down by optimizing these parameters. So if you haven't done this yet, definitely a blogpost to check out.

[00:13:21] The next post is PG Bouncer. One point 120 is released and they say this releasecontains a variety of minor enhancements and fixes. Also contains some fixes to the new Scramsupport in PG Bouncer. One point eleven, improving interoperability with newer PostgreSQLversions. So users of Scram are particularly advised to upgrade. So if you upgraded for Scramsupport, you'd probably want to upgrade to one point twelve relatively soon. And this is from aPgbouncer.org.

[00:13:54] The next post is PostgreSQL Connection Pooling part One pros and Cons. So againafter a PG Bouncer post, this is a very general post that talks at a very high level about thebenefits of connection pooling and some of its disadvantages benefits being able to have moreconnections, virtual connections as they were relative to the physical connections to thedatabase. Some of the cons is you're introducing another source of downtime if something goesdown as well as increased latency. And you're going to be putting essentially your authenticationlayer within PG Bouncer a lot of cases because the clients need to authenticate to PG Bouncer.But if you're considering using a connection pooler if you haven't already, definitely a blog post tocheck out.

[00:14:43] Next post is that PostGIS --3.1.0 Release candidate Two is released and there's the information here from PostGIS Netrelated to that is a post about PostGIS and that is generating land constrained geographicalpoint grids with postgres. This is from Corbend Net. So if you have a use case for doing that,definitely a blog post to check out.

[00:15:10] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the Show Notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks. --

episode_image