background

Book Sales, B-tree Boost, More Postgres 12, Using pgBackRest | Scaling Postgres 92

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss book sales, boosts to b-tree indexes, more Postgres 12 features and how to setup and use pgBackRest.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about book sales, b Tree Boost, morepostgres twelve and using PG backrest. I'm creston. Jameson and this is scaling postgresepisode 92 one.

[00:00:20] Alright, I hope you're having a great week. The first piece of content is actually anannouncement that two books are on sale this Cyber Monday and also Black Friday. But ofcourse you're missed about the time this episode is published. So hopefully you've watched thisepisode early and SQL Performance Explained@sqlperformanceplained.com is on sale andthere's a Cyber Monday 19 code where you can get the PDF for free. So I've actually purchasedthis so if you're interested, definitely a book to check out. The second book is The Art ofPostgresql@theartofposql.com and they're also having a sale and they're having 50% off with acoupon black Friday 2019. So definitely a great time to get some great educational materials onPostgreSQL.

[00:01:14] The next post is Btree index improvements in PostgreSQL twelve. This is fromCybertechn Postgresql.com and it's talking specifically about the improvements that were madeto the index. It's mostly an infrastructure improvement and the example that they list here is theycreated a two column table that's thought to be considered a join table. So they have a columnaid and bid that are begins and there is a composite or multicolumn index for the primary keyacross both and then a secondary index on bid in case you wanted to query those numbersfaster. And then to that they added test data using this query seen here using Generate series,and they say each bid is related to 10,000 AIDS. Now the first thing you'll notice is the sizedifference. So in version eleven the size is 545 megabytes for the bid index alone, but in versiontwelve it's 408 megabytes, so it's 33% bigger in version eleven. So the question is how is thishappening? And they say that every instance of bid occurs 10,000 times in this index. So there'sa lot of duplicates in this index, so a lot of the leaf pages, all of the keys are going to be thesame. And since PostgreSQL doesn't quite know where a new index is going to go, you couldget one of these middle leaf pages being split and you're not necessarily going to be filling ineach leaf page. And the reason is that such entries are stored in no special order in the indexthat are all the same value. But they did mention the rightmost leaf page was always splittowards the right end to optimize for monotononically increasing inserts. In contrast to this, otherleaf pages were split in the middle, which wasted space. But what they're doing in version twelveis they're adding the physical address or the TID, the tuple ID as a part of the index key. And asthey say so duplicate index entries are stored in table order, so the physical order of the table.So of course the obvious benefit is that it says you should get significant performance benefitswhen trying to pull out this value because they'll all be clustered together on disk and then theysay moreover pages that consist only duplicates will be split at the right end, resulting in thisdensely packed index. So it basically packs up the instance more to get that space savings. Andthe second index improvement that they did is compressing storage of internal index pages. Solooking at version eleven and version twelve, it basically when doing a query using it looks to bethe primary key. Actually one less block is read in version twelve and they say because internalpages can fit more index entries. So how does it do that? It describes how it started storing theTID as part of the index, but they also have a way to reduce this redundancy because if you didit for every single index key it would take a hit to Space Savings where the TID is redundant asour non key attributes from an include clause. They introduced a truncation of these redundantindex attributes and they show here in version eleven you can see an item length of 24 whenlooking at it through the page inspector extension, whereas it's 16 in the version twelve. So thatalso gives you new space savings. But then the important part to remember here if you want toget these improvements is that you actually have to do a reindex or rebuild the index in order toget these benefits. Because doing an upgrade using say, PG upgrade is not going toautomatically upgrade the indexes, you actually have to do a reindex of each index to get thesebenefit --s. So definitely a good explanation of the index changes that came with version twelve ofPostgreSQL.

[00:05:12] The next post also from Cybertechn Postgresql.com is discovering less knownPostgreSQL twelve features. Now some of these are going to be redundant that we've coveredin previous episodes, but I'll just run through quickly some of the ones mentioned here. First oneis automatic inlining of common table expressions which was discussed before. Allow paralyzedqueries when in serializable isolation mode so that could give some potential performancebenefits if you're using a serializable isolation mode for some of your queries, JIT is enabled bydefault. But they also in here mention is that if you're not going to be using like a datawarehouse use case or maybe your database is not quite as large, they say you could also tunethe jitting threshold family of parameters JIT timesum cost so that the small medium data setswon't use them. The next one is support for this SQL JSON path language. Allow foreign keys toreference partition tables which we've heard before. Add partition introspection functions to beable to look at roots and ancestors in the tree of a set of partition tables. Add connectionparameter TCP user timeout to control libpq's TCP timeout show the manual page URL inPostgreSQL's help output for an SQL command. Allow vacuum to skip index cleanup, whichcould have some benefits but of course you don't want to use it. You do eventually need tovacuum those indexes. Add explain option settings to output non default optimizer settings. Sodoing settings on could be beneficial to get additional detail when you're doing an explain plan.Allow logging of statements from only percentage of transactions to minimize the number of logfiles. Cause recovery to advance to the latest timeline default. This is when you're doingrecovery. So this is beneficial not having to set it all the time because this is usually what it's setto is the latest and now it does it by default parallel automatic index rebuilding using a reindexconcurrently, which is great. PG checksums can now enable disable page checksums for anoffline cluster and they discuss a little bit here. Hopefully it eventually they'll be in online mode,but right now it's offline. Allow create tables table space specification for a partition table to affectthe table space of its children. So this looks like a definite convenience. Allow values producedby queries to be assigned to PG bench variables. So this will be useful if you use a lot of PGbench. Allow fractional input for integer server parameters, which is interesting because now youcan do things like set your work memory to 1.5gb as opposed to say 1500 megabytes. Allowvacuum DB to select tables for vacuum based upon their wraparound horizon. And then lastly isthe changes for recovery.com in that it has gone away and that they now have the signal files.So just yet another post of all the different changes that came in PostgreSQL twelve, so if you'reinterested in learning more, definitely a blog post to check out.

[00:08:16] The next post is setting SSL TLS protocol versions with PostgreSQL twelve and thisis from secondquadrant.com, they're talking about the new settings, the SSL Min protocolversion and SSL Max protocol version. So I tend to do a lot of this working with NGINX and nowPostgreSQL has the capability to set a Min protocol version as well as a max one and they gothrough when different things were released. In terms of the different protocol versions, SSL isnot considered secure anymore, but they were mentioned here for completeness and then theymentioned the TLS versions. 1.2 is pretty much what most people are using. There are somethat are still supporting one and 1.1 and 1.3 is relatively new, not really much use as of yet. But ifyou actually want to see what your PostgreSQL instance is doing, there is, as I mentioned here,a PG stat SSL view to see the SSL status of all connections. And he mentions you can alsoenable log connections in the server for each connection attempt because it'll list the SSLparameters used. He mentions when you connect generally you're going to get an output if it'san SSL connection and gives you the different parameters as well as you can use the PSLcommand backslashconinfo to have the information printed at any time. So if you're interested inlearning more about this setting and how you can adjustment and even these different --ways to determine what SSL version PostgreSQL is using currently, definitely a blog post tocheck out.

[00:09:49] The next post is hope is not a strategy. How hardware issues affect your PostgreSQLdata. This is from Crunchydata.com and this is pretty much an opinion piece. He was saying thathe was made aware of a bulletin from Ewlet Packard where certain models of their SSDs have afirmware bug that will cause drives to deterministically fail very suddenly at precisely 32 768hours of operation. And he says, of course, this is pretty much due to an overflow of assignedshort integer and that this could also be affecting other models of these hard drives. So if youhave these in some sort of arrayed array, you could suddenly have the whole system go down.Because you're not going to only have one failure of a drive due to some mechanical issue orelectronic issue, but there's a programmatic issue that's going to bring all of the data offline andit just caused him to prompt some questions, like where exactly is the data? If you're trusting thecloud to do it, are they taking care of the hard drives? Are they keeping track of this to be awareof these issues? What kind of drives are holding your data? Is it SSDs, is it magnetic storage?Does firmware get updated, et cetera? Are you doing regular continuous backups and are youtesting those backups to verify that you can restore them? And he mentioned someone that heworked with previously had said hope is not a strategy. So if you're just hoping that's not astrategy, you really want to know and be relatively confident that your data is being taken careof. Now, related to this, they have another post called how to get started with PG backrest andPostgreSQL twelve. So basically they go through the whole process of using PG Backrest to doa backup and then test restore exactly kind of what they're advocating. And they go ahead anddo this on a CentOS box and use PG Backrest 2.19. And they using the most recent version.But what they make note of here is that if you're using PostgreSQL Twelve with the newrecovery changes, you're going to want to use a version of PG Backrest 2.18 or greater. Sousing the recent version, they install PostgreSQL, install PG Backrest, they configurePostgreSQL to be able to do the backups. They configure PG Backrest, put it all together alongwith archiving the wall files and performing the first backup, and then as the last step, the restorethe backup. So if you're wanting to get started using PG Backrest, this is definitely a great postto check out the last post is that PGConf in Russia's YouTube channel has recently added anumber of videos from a recent PG comp conf conference in Russia. Some of these are inEnglish and others are actually translated in English. So if you're looking for some videoeducational content, definitely a website to check out.

[00:12:53] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes.Thanks. --

episode_image