background

Scaling Suggestions, Slot Safety, SCRAM Channel Binding, Effective Cache Size | Scaling Postgres 125

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss suggestions for scaling postgres, replication slots safety, implementing scram password authentication with channel binding and setting effective_cache_size.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about scaling suggestions, slot safety,scram channel binding, and effective cache size. I'm creston. Jameson and this is scalingpostgres episode 125 one.

[00:00:23] All right, I hope you your family link coworkers continue to do well. Our first piece ofcontent is some SQL tricks of an application DBA. And this is from Hakibanita.com and he'sgiving some scaling suggestions essentially for different ways to use the database. Now, thecomic that is embedded here is particularly great because you have a couple of developers thatare trying to figure out the best way to optimize a query. And then a magical DBA appears andthe query goes from seconds down to 00:30 3 seconds and they're amazed at the magic thatwas done. But moving on to the actual content of the post, it goes over a series of suggestionswhen working with the database on how to make things more performant or more efficient. AndI'll just mention some of the highlights here like update only what needs updating. So don't, forexample, run a function against every column in the database. Be sure to use aware to reducethe surface area that you're updating. And in PostgreSQL's case, this also limits the amount ofBloat you're creating in the table. Next is disable constraints and indexes during bulk loads. Veryimportant. Next is use unlock tables for intermediate data. So unlock tables don't get written tothe wall and they are much faster. The problem is there's no crash safety. So if you have a crashin the middle of it, the data is essentially lost. It's totally unaware that that table exists and it's notgoing to try and rescue it. The next one is implement complete processes using with andreturning. So basically using CTEs to be able to develop a stepwise process and using results ofa previous query to move on to the next. And he addresses the issue of identifying, duplicatesand resolving them using a CTE. Although at some point he does mention some caution whentrying to execute DML in a CTE, so need to be aware of that. But I've noticed a lot ofprogrammers like using CTEs. It's easier for them to reason with compared to say, usingsubqueries. The next section is avoid indexes on columns with low selectivity. So basically aboolean column where you only have two values, it doesn't make sense to necessarily index thatbecause there's so many values that are the same. So using an index is not efficient to do that.However, the next section he talks about here is partial indexes. So partial indexes are great touse on Booleans. So if you're only interested in one of the booleans, you could just create onepartial index, or if you're interested in both, you could create two. The other way you could usepartial indexes, of course, is if you have, say, statuses like you have four or five statuses.Sometimes it gives you better performance to create four or five partial indexes versus one andtrying to find all of those statuses with a single index. The next is always load sorted data. Sobasically when you're loading data in, always do it in a sorted fashion so you have highcorrelation between what's on the disk and what's in the tables or indexes. Next section is indexcolumns with high correlation using Bren. So Bren is the block range index and it indexesranges. So one disadvantage of Brin is that it's not as fast pulling out single values compared toa B tree index, but it is still relatively fast but it gives you dramatic space savings. And he had anexample here where I think the Brin index was 46 times smaller than the equivalent Btree indexand for that it's smaller and easier to cache because it's so much smaller, but pulling out a singlevalue is not as efficient. But these are good for things that have high correlation, for example,sequential IDs or date times. The next section is Make Indexes invisible. So basicallytransactional DDL statements and the last suggestion is don't schedule long running processesat round hours because nearly everybody tries to record something at a round hour and youbasically get spikes in your database usage during that time. So basically offset it a bit. So if youwant to review these suggestions in detail, go ahead and check out this post fromHakibinita.com.

[00:04:36] The next post is related to the previous one and it's scaling relational SQL databases.Now this is not postgres specific, but so much of it applies to postgres, but he's consider --ing MySQL MariaDB or PostgreSQL and this is basically a hit list of things to consider toimprove your database performance. He talks about step one update your database. Get on amore recent version to get the best performance ways you can scale vertically, look for ways youcan potentially leverage an application cache. And he goes on and on with different areas youcan examine to get better performance out of your database. Now one that didn't seem to reallyapply to postgres, but you need to be aware when you're making changes to your objects in thedatabase. But he says make alter tables work. So I don't know if this is a problem with MySQL ornot, but I haven't really seen this issue with postgres. There's definitely issues, particularly withearlier versions when you're adding a column and wanting to make it not null or set a default.There's certain things you need to take care of when you're doing that. Less so in more recentversions. But I've never had an issue with altering the table wasn't necessarily fast. So again, ifyou want to check this out as a reference, this blog post can be found on a stripny name.

[00:05:55] The next post is PostgreSQL 13. Don't let slots kill your primary. And this is referringto you have a replication slot on your primary database for a Replica, say. Now if that Replicagoes down, that slot is going to maintain the wall on the primary and save it there and notarchive it until that Replica comes back. So essentially, you can run out of disk space on yourprimary database and bring the database down if you don't have alerting set up for your diskspace. But with version 13, there's a new setting that's called max slot wall keep size. Sobasically define the amount of wall to keep around if the Replica goes down or stops connectingto the primary. So at some point it's going to deactivate that slot, which means the wall can thenbe archived and the primary can continue functioning. So basically it's a trade off. Your primarycan stay up and running, not run out of disk space. But the disadvantage is this will kill yourreplication with a Replica. Now, by default, they say that max slot wall keep size is zero. So thisisn't active, but you can set this at a particular size of wall to keep around to enable this feature.And he says there are also two different parameters that have been added to PG replicationslots that you can monitor. One is a wall status that gives an indication of whether it's reserved,it's unreserved whether the status has been lost, so it's lost communication with the Replica andalso the safe wall size. So this is a really great feature coming in 13 that I'm definitely lookingforward to. And I'll probably start using more replication slots because of this because a lot oftimes I haven't used replication slots because of the disk filling risk. But this seems to mitigatethat issue. So if you're interested in learning more, check out this post fromSecondquadron.com.

[00:07:49] The next post is waiting for PostgreSQL 14. So this is the one coming in over a yearfrom now. But rename wall keep segments to wall keep size. And this is from Dep.com. Andbasically because of the previous setting where they're allowing you to define in terms of sizehow much wall to keep around, it looks like they're renaming how many segments to keeparound to the size of the wall you want to keep around. So it looks like they're moving to a sizeparameter to retain wall versus just account. So if you're interested in looking at this change, youcan review the patch note that is right here.

[00:08:29] The next post is how to securely authenticate with Scram in postgres 13. Now, we'vehad a few posts in recent episodes of Scaling postgres about Scram authentication inPostgreSQL. Now this is addressing 13 because 13, the client allows channel binding to bedone, which helps the client actually validate that the server is who it says it is. And this postalso does a really good job of describing how to set up Scram authentication with channelbinding in postgres 13. So it goes over in detail all the steps that you would need to use to setthis up and get Scram working on your postgres instance. So if you're interested in doing that,check out this post from Citusdata.com.

[00:09:14] The next post is effective. Cache size. A practical example, this is from CyberTechPostgresql.com and it's talking about the parameter effective cache size, which essentiallycommunicates infor --mation to the optimizer to determine how to cost out different queries. And primarily this helpsdetermine whether index scans should be used or not. Now, as an example, they created arandom table with about a million rows and then an ordered table with about a million rows, andthey set the effective cache size very low 1 MB, turned off bitmap scans and then ran a querythat got a particular cost estimate. Then they ran it setting the effective cache size to 1000GB,so a lot higher than 1. Here the cost estimate was smaller, so basically larger cache sizes. Theoptimizer assumes that more of the indexes will be in memory, so it projects a lower cost for thatcompared to smaller cache sizes. But now both of this was done using the random table thatthey created because with an ordered table, the cost estimates makes no difference, no matterthe size of the effective cache size. That's because of the correlation between the data on thedisk and the index itself. So it already believes it's going to be pretty efficient to pull out that data.And in terms of recommendations, in terms of effective cache size, they recommend about 70%of the memory, although I've seen estimates between 50 and 75%. So basically you want to pickone of those and then adjust to see what gives you the best performance for your queries. So ifyou want to learn more about this setting, check out this post from CyberdeckenPostgresql.com.

[00:10:59] The next post is migrating to PostgreSQL version 13 incompatibilities you should beaware of, and they're talking about some incompatibilities moving from twelve to 13, I believe.So for example, some of the ones they mentioned is similar to escape null and substring of itreturn null. The next one JSON b, two TS vector properly check the string parameter. The nextone relates to lTree. So a lot of these don't seem to be very commonly used, but definitely someissues to be aware of. And they run through about twelve different changes here. So definitelybefore you upgrade to dip 13, you should check out this blog post from Percona.com so you cansee if you're using any of these features.

[00:11:44] The next post is Webinar being committed a review of transaction control statements.Two out of three follow up. This is from Secondquarter.com, and it's a post about transactionsrelated to replication. So for example, how do transactions get replicated to Replicas? And howyour settings for like synchronous commit impact when data appears on the Replica as well asfor performance. So basically, the more durable you make your communication with a replica,say synchronous, replication, the higher latency you introduce as well as reducing thethroughput. But if you want to go faster, have a higher throughput less latency you're going tosacrifice some durability like you're going to use asynchronous replication and maybe not wait asmuch to have the data up to date on the replica and they address a little bit of vacuum andfreezing. So if you're interested in this webinar, you can just click the link here to go ahead andregister for it.

[00:12:42] The next post is SQL style guide. This is from Sqlstyle Guide and it's a published styleguide for SQL Code. So if you're interested in adapting a style, this is one interpretation of it. Iknow this is very subjective, but I saw this post and thought it was interesting. Now, related to it,bruce Momgm's blog at Momgmut US mentioned PG formatter. So basically it takes your SQLcode and formats it in a more friendly format. So he has had some good luck with that. So ifyou're looking for a tool to help you format your SQL files, maybe check out PG Formatter. Thenext post is Connection Pooling in Pgpool Two, and this is from Bping Blogspot.com, and thisexplains a little bit about what Connection Pooling is and how Pgpool Two does it. So if you wantto learn more about PG Pool Two and how to set it up, I've found this to be a really great postexplaining how to do it compared to some others that I've seen. So if you're interested in that,check out this blog post.

[00:13:48] The next post is Announcing PG Backrest for Azure fast, reliable postgres backups.So it describes how to set up PG backrest as well as connect it up to Azure to store yourbackups. So if you're interested in that, check out this post from Crunchydata.com.

[00:14:05] The next post is Backtraces in PostgreSQL, and this is from Amit Khan at PGBlogspot.com, and he's basically describing ho --w you can set up Backtraces in PostgreSQL to diagnose errors that you receive. So check thispost out if you're interested in that.

[00:14:23] And the last post is the PostgreSQL Person of the Week is Gilberto Castillo. So ifyou're interested in learning more about Gilberto and his contributions to PostgreSQL, definitelycheck out this blog post that does it. For this episode of Scaling Postgres, you can get links to allthe content mentioned in the Show Notes. Be sure to head over to Scalingpostgres.com, whereyou can sign up to receive weekly notifications of each episode, or you can subscribe viaYouTube or itunes. Thanks. --

episode_image