background

Toast Impact, zheap Storage, More Performance, pg_rewind Changes | Scaling Postgres 137

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss the impact of toast, zheap storage results, more Postgres 13 performance data and changes to pg_rewind.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about Toast impact, Zheep storage,more performance in PG rewind changes. I'm Kristen Jameson and this is Scaling Postgres,episode 137.

[00:00:22] All right, I hope you friends, family, coworkers continue to do well. Our first piece ofcontent is statement from the PostgreSQL core team on the EDB acquisition of secondquadrant. And this is from postgresql.org. And they're talking about the issue where EDBacquired second quadrant, and now I believe it's, up to 60% of the core team is comprised ofEDB employees. And as they say, quote, there's been a long unwritten rule that there should beno more than 50% of the membership of a core team working for the same company. Sobasically, they say they're looking into this and discussing what to do, and they just wanted to puta post up for the community to let them know that they are aware of this and they're looking tosee what solutions they could make. So it's interesting that they actually posted this on thePostgreSQL website. So just some information to keep in mind.

[00:01:17] The next piece of content is the surprising impact of medium sized texts onPostgreSQL performance. This is from Hakibinita.com. Now, in terms of medium texts, hebasically says small texts are very small strings, like names, slugs, usernames emails, et cetera.And you may be using a varcar instead of an actual text. Large text is large content such as ablog post, articles, HTML content, whereas medium texts are things like longer descriptions,comments, reviews, stack traces, things like that. So it's text that's not quite like a full blog postor a full document, but not very short strings of text. That's what he's calling medium text. Andbasically, this is a great article, talking about Toast and explaining what it is and how it works. Sohe goes into what Toast is, and that because PostgreSQL has a fixed page size. A Tuple cannotoccupy more than one page. Therefore, Toast was come up with as a way to store more than 8KB within a row. So basically, it overfills into this Toast storage. And he has information herefrom the docs that basically say the Toast will compress and or move field values out of line untilthe row value is shorter than the Toast Tuple target, which is typically 2 KB. So whenever a tableis created, there's also a Toast table created as well. And he used this technique here to identifythe Toast table of a table he created. Then he actually looked to describe the table and youcould see that it has a chunk ID column, which is an OID, a chunk sequence column, which is aninteger, and a chunk data, which is the raw data. So basically, even the Toast table can exceedthat 8 KB. So basically, it splits it up into separate chunks of data, each with a sequence numberthat references the value that's Toasted now he does an example here where he inserts a smalltext value into the table. And when you check the Toast table you can see it's still empty. Soeverything fits within the main table itself. Nothing is toasted. But then when he inserted a longertext string with over 4000 characters, you can see that now the Toast Table is being populatedand that it actually spans three rows in this Toast table. So this reference is the same ID. But asthis diagram represents, it's split into three rows in the Toast Table, each chunked with asequence, as you can see here. Now he also mentioned that it can also do compression and thecompression kind of depends upon how random the string is. So to look at some examples, heactually created a function called Generate a random string that you can specify the length aswell as the number of different characters to use in generating the random string. So he actuallyinserted a ten kilobyte string and it was split up into six chunks within the Toast Table after thisinsert, using entirely random strings. But when he actually restricted it to only three differentvalues, therefore it's easier to compress a string with only three values in it compared to a greatmany values. Inserting the same length, 10 KB is actually only stored in two chunks and onlytakes up 3 space, so it compressed it by about 60 or 70%. Now what's even more interesting isthat when he uses only one character and the same 10 be stored, it compresses it so well itdoesn't even store it in the Toast table. Now he does mention for completeness that you canconfigure the Toast table per table and that includes t --he Toast tuple target, which is the minimal tuple length after which PostgreSQL tries to movelong values to toast, and then the storage, which is the Toast strategy. The default is extended,but it also offers four other toast strategies and basically this varies whether things arecompressed or not and whether they're moved out of line or not. And then he gets into Toastperformance. And for this example, he creates a small table, a medium table and a large table.And those links are an indication of the string length. And he inserted 500,000 rows into each ofthese tables using the different length strings. He turned off parallel queries and it's just doingsequential scans and he looks for a single ID in each table. The small table ran in 25milliseconds, the median table eventually ran in about 173 milliseconds, whereas the large tabledid it in 37 milliseconds. So the question is why is the median so much larger than both the smalland the large? And the reason being is because of where the data is stored. So the small is allstored in line and it's a relatively small table. There's essentially nothing being stored in theToast. Now, he purposefully inserted a value that just barely fit into the table and did not spillover into the toast for the medium value. So you can see the toast value is zero, but all thosemedium strings had to be stored somewhere. So they're stored in line in the table. But as aconsequence, the table, as you can see, is almost a gigabyte, whereas the smaller one is at 21megabytes. Then when he did the large one, pretty much all of that spilled into the toast. It'salmost 2GB the Toasted table, but yet the size of the actual table itself is large. So when you'reactually doing a scan, it's much faster to scan the small table, 21 megabytes, the large table 25megabytes, but it's a lot slower to scan that as almost 1GB medium sized table. Now, in terms ofthis analysis, what he was searching on was the ID. So the ID is located here. If he was actuallysearching on the actual text value in there, you'd get a much different result because you wouldhave to search through this Toast table and that's what he did. Here where he searched throughthe Toast values. And here you could see the large table doing a search took over 7 seconds.The median got down to eventually 260 milliseconds, whereas the small was 50 milliseconds. Sothat makes a lot more sense. It's searching based on how large the data is. And then he lookedat the impact of adding an index on the ID. And of course, when you query on the ID you getapproximately the same speed, less than a millisecond for each one when querying on the ID.But then he tried querying a lot of data for each of the tables using the ID. So using a range, saypulling half the values of the ID out, and here he saw something different. Where the small tablewas about 60 milliseconds, the medium table took longer at 284 milliseconds and the Toast tablewas smaller at 70 milliseconds. So I am a little surprised by this result. I would have expectedthe medium and the large to be similar because the ID index sizes should be the same betweenall three tables. And because you're selecting all columns, you still have to go to the heap to pullin the values required. I'm a little surprised that the medium takes longer than the large. He saysthat the database had to read a larger portion of the table, but because it's returning all columns,I would have expected it would have had to have read all the columns of the Toasted table aswell to pull back the results. So, definitely interesting finding. But in terms of adjusting to try toget better performance, he suggests adjusting the Toast Tuple target to determine how muchgets stored in the Toast table or not. And the next possibility is creating a separate table to storesome of these values. But definitely a very interesting post covering toast and how it works andsome performance impact you may run into depending on what values you're storing inpostgres. So definitely a blog post I suggest checking out.

[00:08:47] The next piece of content is Zheep inspecting storage sizes. This is from CyberduckHyphen postgresql.com and they're talking about how Zheep handles storage differently withregards to sizing. And in constructing Zheep, which is a different storage mechanism forpostgres, they've made the Tuple header much smaller and the alignment has been improved onhow data is actually stored at the page le --vel. So first they created a sample table as a temporary table that's about 500 megabytes. Andthen they created table like that one using the standard heap. So that's the default storagesystem and inserted 10 million rows into it. Then they did the same thing for the Zheep storagesystem and inserted 10 million rows as well. Now, what's interesting is that how much slower theZ heap solution was. Now this is still in beta, it's not production ready yet, so that may havesome impact. But the standard storage system, it took 7.5 seconds to insert 10 million rows andit took 28 seconds to insert into Zheep. So that's quite a bit of a difference. But what's interestingis that the size difference is about half the size. So if you look at the standard heap, it was about500 megabytes. Zheep stores it in 250 megabytes. Now, they say this is due to the thingsmentioned before, is that the Tuple headers are smaller. So more of the transaction informationis stored at the page level as opposed to at the Tuple level as well as they do more efficientalignment and padding in terms of organizing the data. Now, what they also mentioned is thatthis Excels, when you have fewer column tables with many rows, you're going to get a lot moresavings as opposed to having many, many columns in the table. Because again, transactioninformation is stored at the page level as opposed to the Tuple level. And then of course, thehallmark of Zheep, the reason why people are potentially investigating this is to do in placeupdates. And you can see when an update is done incrementing each row, the size of the tableis exactly identical to the previous table. So there's no doubling in size. If you did this with astandard heap storage system, it would double in size because an update is comprised of a newinsert and then the old rows vacuuming away. This isn't necessary using Zheep, but the thingyou need to keep in mind is that all of this information is kept around in a rollback area, or maybeit's called Undo and there's actually a directory called Undo where it stores all of this information.So even though it's not stored in the table, it's stored elsewhere. Now, as a consequence,rollbacks are handled differently with a heap storage system. All the rows are right there. Sowhen you do an immediate rollback, it is essentially free because the rows are still there,whereas a rollback with Z heap is more intense because it actually has to literally undoeverything and put back in place things. So for example, this rollback on Zheep took 41milliseconds versus about a 10th of a millisecond for the regular heap. So it's a trade off. Youhave to consider if you move to Zheep, you get in place updates less table Bloat. But if you dohave a lot of rollbacks that can happen, it's going to zap your performance. So, definitelyinteresting post about Zheep. I definitely suggest you check out.

[00:12:06] The next piece of content is postgres 13 performance with Open Street Map data. Sothis is another post about performance of postgres 13. And here he tested versions 1312.4 andhe just used the OpenStreetMap data as a source for checking things out, loading some of thedata. There wasn't any appreciable difference. Then you run it through the Pgosm and thathappened 12% faster than version twelve and 18% faster than version eleven. Then they didsome testing with PG bench and they got a 16% faster result with OpenStreetMap postgres datacalculations. They talked a little bit about the Beecher index deduplication that's present, andthen they also checked the index time creation and you can see that eleven is faster than both,but 13 was faster than twelve. And I suspect that's because of the deduplication, it takes a littlebit more time to create the indexes. Perhaps. But as is the hallmark for version 13, for some ofthese indexes, he's seen a dramatic 70% reduction in size. Now, in terms of performance, hedidn't see that big a difference with the size of the data. But he estimates that when you get intomillions of rows, then you're going to really see a performance difference with the indexes as thesize impact comes into play. But this is another post from Restproof Labs on the performance ofpostgres 13.

[00:13:30] The next piece of content is TPCH performance since PostgreSQL 8.3. This is fromsecondquader.com, and this is the second in a series of blog posts. The first one looked atOLTP performance. This looks at more of a data warehousing --data mart performance for postgres. And again, they covered versions from 8.3 up to version13. They used a consistent hardware setup to do these tests. So basically it gives you a trend ofshowing how performance has changed over time for the different versions. And they used adata set that essentially fits into the shared buffers, a data set that can fit into the memory, andthe data set that has to be spilled onto disks. So basically three different variants. And they alsoadded some parallelism as well for certain activities. So the data loading, you could tell that overtime it's improved. And 13 is maybe not the best, but still pretty good in terms of loading data,loading data with parallelism enabled. It looks like 13 is probably the winner across all theversions, which is good to see. Then they got into the queries and you could see at the 1GB tengigabyte and 75 gigabyte, both parallel and unparalleled. You can see the trend is forperformance to increase over time and 13 looks to be about the best out of all of these. And thenthey looked at some parallel queries and there were some variance between the differentversions. But again, the more recent seemed to overall be pretty much the best out of thedifferent versions. So things are heading in the right way and there's no great regressions forversion 13. So another post about postgres 13 performance if you would like to check it out.

[00:15:08] The next piece of content is PG Rewind changes in PostgreSQL 13. So this highlightssome changes that have been made to PG Rewind. The first one is that you can now write arecovery.com file in the case of versions prior to twelve and version twelve it actually adds asignal file. They do mention some caveats in terms of working with on how to handle things andthat generally it writes the connection information to the PostgreSQL auto.com file because thatis essentially the last file checked to assign configuration values to postgres, so it chooses toplace those there. The next option they're talking about is restore the target wall. So this can tellthe target that you're restoring when doing a rewind of a database system to look in the archivelog directory to replay any missing wall from there in order to catch back up with the primary.And also talking about changes to better handle automatic crash recovery. So starting aninstance in single user only mode to ensure that crash recovery is performed before doing itssync to the primary. So if you're interested in learning about these changes with regard to PGRewind in postgres 13, definitely check out this blog post from Sqlinfo de.

[00:16:23] The next piece of content is data consistency issues in logical replication. This is fromElephanttamer Net. He's talking about the scenario where you have a publisher and subscriberand that the data on the subscriber can still be updated, deleted, inserted into, and that cancause issues with the replication. So he's basically covering some of these scenarios. Soscenario one, data inserted on the subscriber side. So generally you're going to have a replicaidentity or a primary key in order to logically replicate over data. But the issue is that if you insertdata into the subscriber table and then you get an insert coming in from the publisher table withthe same primary key, you're going to get an error. So duplicate key violates unique constraintand he says replication will stop and wall segments will start to pile on the publisher. Sodefinitely an issue. So basically you need to watch out and avoid for these situations. But theway to get around it is to delete the data for that particular ID. Scenario two is data updated onthe subscriber side. So much like you can insert data, you can actually update data. So again,it's an issue to be aware of. Similarly, you can delete data, which could be a problem if data thenneeds to get updated. And then if you have the log set at a certain setting on debug one, youcould get logical replication. Did not find a row for update in the replication target, so that couldbe a concern. Now then he covers a few different ways where you can resynchronize data forlogical replication. The first he says lock the table for writes on Publisher, dump the data andcopy this dump to the subscriber truncate table on subscriber, restore a data dump onsubscriber and remove the lock on publisher. Or two, exclude the table from the currentpublication, truncate the table on the su --bscriber and create a new publication subscription pair. Now, of course, he talks about a veryblack magic way of doing it. I'll let you take a look at that if you want to examine that, butdefinitely some things to be aware of when you're using logical replication.

[00:18:27] Now, related to that, there's an article called Logical Replication Upgrade in Postgres.This is from Crunchydata.com and he goes through the process of using logical replication to doan upgrade in postgres and they go through the whole process to be able to do that. Now, theydidn't mention sequences here, I'm not sure if that gets brought over in the globals, I'll have totake a look. But it's definitely a process to get logical replication up and then choose to basicallyswitch over to it if you want to do an upgrade. Similar to that is an article from several nines.comtalking about how to upgrade PostgreSQL eleven to postgres twelve with zero downtime. And inthis case, they're using logical replication again. So if you want to look through these two poststo see how you can potentially use logical replication to do an upgrade, definitely check themout.

[00:19:18] The next piece of content is using PostgreSQL to shape and prepare scientific data.So this is data you want to analyze, ideally in postgres. And this describes a process ofimporting the data into postgres and using postgres to manipulate and massage the data to get itin the exact format and data types that you're going to be using for your analysis. So he has aprocess to run through doing it all in postgres as opposed to using spreadsheets in order toupdate data values manually. So if you want to check out a more automated way to prepare yourdata for scientific analysis, you can check out this blog post from Crunchydata.com.

[00:19:57] The next piece of content is the WayPG Store Null Value in Record. So this post goesinto some of the internals about how postgres stores nulls and some of the consequences withregard to comms when you delete them in that it can take up some more space because deletedcolumns are considered nulls for all of those values and that has a storage impact. So if you'reinterested in learning more about the internals in postgres, you can check out this post fromhighGo CA. The next piece of content also from highGo CA is Free Space Mapping File andDetails. So if you want to learn more about their free space map file in postgres, definitely checkout this post.

[00:20:38] And the last piece of content is the PostgreSQL person of the Week is DamienGlitchard. If you're interested in learning more about Damien and his contributions toPostgreSQL, definitely check out this blog post that does it. For this episode of Scaling Postgres,you can get links to all the content mentioned in the show notes be sure to head over toScalingpostgres.com, where you can sign up to receive weekly notifications of each episode, oryou can subscribe via YouTube or itunes. Thanks. --

episode_image