background

Postgres 12 Released, Generated Columns, Parallel Estimates, Change Data Capture | Scaling Postgres 84

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the release features of Postgres 12, generated columns, parallel estimates and change data capture.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about postgres twelve being released,generated columns, parallel estimates and change data capture. I'm Kristen Jameson and this isscaling postgres episode 84.

[00:00:22] Alright, I hope you're having a great week. Our first piece of content is PostgreSQLtwelve is released. This is from Postgresql.org website and it was released on October 3. Andsome of the main areas they talk about in terms of what's been updated is overall performanceimprovements. So the first is a significant performance and maintenance enhancements to theindexing system and partitioning. So indexing can get up to a 40% space reduction in gain inquery performance and it handles workloads better where the indexes are frequently modified.There's also partitioning improvements where you have thousands of partitions and alsoimproves insert and copy performance as well into partitions. They reduce the write ahead logoverhead when doing Gist gen and SP Gist index types. You can create covering indexes. Theinclude clause on Gist indexes create statistics now supports most common value statistics justin time compilation happens by default. Also enhancements to the SQL JSON path commontable expressions are inlined by default. But again, you can change it whether you want itmaterialized or not. And then generated columns which we'll discuss in a post a little bit later.

[00:01:44] And then some other things interesting mentioned in terms of administration is nowyou can reindex concurrently, which is huge, and also PG checksums. Now there's a few otherpostgres that we're going to talk about that discusses postgres twelve. The next one is eightmajor improvements in PostgreSQL twelve. And this is from Enterprisedb.com. First thing Italked about is the partitioning performance. Again, when you have a lot of partitions that wasimproved. We saw a post on that in a previous episode of Scaling Postgres as well as in certaincopy statements, performance improvements. Again going back to the Btree indexenhancements, doing spatial utilization and some of the other ones I just mentioned, talkingabout the multicolumn most common value statistics so that you can define statistics acrosscolumns, the common table expressions or CTEs having the width query being end aligned, andyou can define whether you want it to materialize it or not. They mentioned prepared plan controland they mentioned that previous versions of PostgreSQL would use a custom plan five timesand on the six generate a generic plan and uses as if it's as good as one of the custom ones.But that behavior can now be controlled through a variable called plan cache mode and thisallows you to use a generic plan right away. And they say that this should benefit users whoknow their parameters are constant and know the generic plan will work. They mentioned thejust in time compilation, the checksum control, and that now you can change a cluster from nochecksums to checksums without having to dump and reload the data.

[00:03:24] You must be offline to do this change. So it sounds like it does require a stop and astart, but again being able to turn them on or off, that would be beneficial. And they mentionedthe reindex concurrently. So again, a lot of great changes in version twelve.

[00:03:41] The next post again related to the new version twelve feature is generated columns,and this is from PG IO. And the first thing you talk about, what are generated columns? Wellthey just created a table here with three columns, looks to be a width, a height, and then you cancalculate the area, but that's generated always as a width times height and it's going to bestored. So this is stored in the table and whenever the width or height are adjusted, this willautomatically update. So you can see here you're doing an update, you're going to set the widthto 40 and now that area will automatically be regenerated again. And they mentioned a fewpoints you should know about generated columns. One is persistence. So currently the value ofthe generated columns have to be persisted and cannot be computed on the flight query timeand the stored keyword must be present in the column definition. Two, the expression used tocompute the value has to be immutable, so it can depend on other columns but not othergenerated columns. They can be used in indexes, but the values of generated columns areomitted from the output of a PG dump in the copy --table commands. Now they have an interesting example here where they actually use it tomake a column to support a full text search. So they have scenes and some information hereand they have a body column with the full text of this scene from a play and they generated acolumn as a TS vector and converted the body to a TS vector. So now that you can easily do fulltext searches, so that could be a really great use case for generated columns. So if that's ofinterest to you, definitely a blog post to check out. The next post is freshly baked PostgreSQLtwelve and changes are brought to PG Center. So I believe this is a product for monitoringpostgres. I'm not that familiar with it, but they mentioned some of the changes that have beenmade. The first two they mentioned are more progress indicators. So when you're doing anindex creation, you can look at PG Stat Progress to check the status of the creation of this index.And when you're doing a vacuumful or a cluster you can use PGSTAT progress cluster. It alsomentions that on PGSTAT database there's two new columns that have been added, checksumFailures and Checksum Last Failures, so that you can check on those checksums that you canenable now once you start and stop your database. So again more information regardingchanges to the system views to be able to do better monitoring. So if you're interested in that,check out this blog post and this is from Dataegret.com.

[00:06:18] The next post is how PostgreSQL estimates parallel queries. And this is fromCybertechn postgresql.com. And they're talking about parallel queries which were introduced in9.6. And this particular post talks about just parallel sequential queries. So they set up a simpletable here with two columns, one with I believe, 10 million different values, and then the othercolumn only contains two different types of values. Now, first they just want to see what nonparallel looks like. So they set the max parallel workers per gather to zero and then do the query.So they get a sequential scan and then they want to figure out how it calculates differentestimates here. So for example, this is an explained, it didn't run the query. So you're seeing thisestimate here and he asks, well, how did it come to that? So he looked at the PG relation size toget the number of blocks. And then he used this calculation taking the current setting of thesequential page cost multiplied by the number of blocks and then the CPU Tuple cost multipliedby, I believe, the number of rows and the CPU operator cost by the number of rows. And you getthe exact cost estimate that they come up here for a non parallel scan. Then he did the samething where he looked at parallel Sequential scan and he shows the formula that is used here toestimate how long it will take to do this particular parallel scan, particularly looking at the numberof rows that it estimated here. And they take the number of estimated rows divided by thenumber of cores plus one minus the leader contribution. That's the leader that's aggregating theresults from each of the gather times, the number of cores. So this is a pretty interesting look athow PostgreSQL actually comes up with its estimates, which I was not too familiar with, but ifyou're interested in learning more about that, definitely a blog post to check out the next post.Fixing Track Activity Query Size in Postgresql.com so they're talking about an issue where insome of the system views, if you have very long queries and some of these system views tellyou what queries running for a particular activity, like a PG Stat Activity or PG Stat statement,sometimes those queries get cut off. And how you can prevent that is to modify the Track Activityquery size. But unfortunately there are some costs to that. So you probably don't want toincrease it by too much and it actually does require restart of the server. But again, if you have abig system, he says here, increasing it is generally a good idea because more information isgenerally better. So if you've run into an issue with some of the queries you're looking at insystem tables being truncated, definitely check out this post and this setting track Activity Querysize. And this is from Cybertechn postgresql.com.

[00:09:06] The next post is PostgreSQL change data capture with debesium. So this is a postthat talks about using Change Data capture. So it's basically using the wall stream andspecifically probably logical decoding in --order to extract data as it's being produced. So you don't have to rely on triggers or goingthrough logs. Debesium can actually be set up to use logical decoding to read essentially thewall stream to be able to track what data has changed. Now the use case they're talking abouthere is that maybe you have a couple of relational databases. You can use the Didbesium tolook at the wall stream and then send it to this JDBC connector to update an analytics database.So this is a very comprehensive post that goes through all the different commands to set up andget this up and running. So if you're interested in implementing a Change Data capture methodor using the beesium, definitely blog post to check out. And this is from Crunchydata.com.

[00:10:10] The next post is how to work with Postgres in Go. So if you use Go you woulddefinitely I think find this of benefit to get the most out of working with PostgreSQL. This ismostly about the interaction of Go with postgres and general recommendations and hesummarizes a lot of his conclusions at the bottom here. But as you can see it's a verycomprehensive post but he gives a suggestion on what postgres driver you should use configurelimits for your connection pool size collect connection pool metrics log what's happening in thedriver. So a lot of recommendations on using Go with PostgreSQL. So if you're using that tool,definitely a blog post to check out to make sure you get the most performance out of it.

[00:10:55] The next post is the Transparent Data Encryption in PostgreSQL and this is from ahigh Go website and it was recently announced they were looking to target this for PostgreSQLtwelve, excuse me, 13.

[00:11:09] And this individual says I've been working with a PostgreSQL community recently todevelop transparent data encryption. So this goes through kind of the rationale and thinkingabout it and how to implement it. So if you're interested in that type of content, definitely a blogpost to check out.

[00:11:24] Another post by Haigo is a guide to create user defined extension modules topostgres. So again, if you want to do extension development, this is a blog post that walksthrough how you can go ahead and get started creating your own extension in C.

[00:11:41] So if that's of interest to you, definitely a blog post to check out. And the final piece ofcontent is trace Query Processing internals with Debugger. So this goes through the wholeprocess of tracing a query through PostgreSQL. So this is again a very comprehensive blog postthat goes through describing each steps that the query goes through through PostgreSQL. So ifyou are looking wanting to look at the internals and how PostgreSQL does its querying. Thiswould be another piece of content to check out.

[00:12:13] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingposgres.com, where you can signup to receive weekly notifications of each episode. Or you can subscribe via YouTube or itunes.Thanks. This. --

episode_image