Explain Analyze, Sequence Counts, Best Books, Partition Migration | Scaling Postgres 112

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss how to use explain analyze, the problem with counting sequences, the best books to learn Postgres and how to migrate to declarative partitioning.

Content Discussed

YouTube Video

Podcast Audio


[00:00:00] In this episode of Scaling Postgres, we talk about Explain analyze, sequencecounts, best books, and partition migration. I'm Kristen Jameson and this is scaling postgresepisode 112.

[00:00:20] You all right? I hope you, your family and coworkers are doing well given the situationwe're still encountering. But our first piece of content is Optimizing SQL. Step one, explain costsand plans in PostgreSQL part two. This is from Higo CA. And as you can tell, this is part two ofthe Post about Explain. Now, we covered the first version a number of weeks ago explaininghow Explain worked, and it actually shows you how you could actually build a query based uponan Explain plan. This one goes into Analyze and talks more in depth on how to interpret some ofthe output. So in terms of their setup, they just have a big table and a small table. But the bigtable still just only has 100 unique rows and a smaller table just a little bit smaller. The first thingthey talk about is costs. And that basically PostgreSQL uses a cost base optimizer. So it takesinto account some of the configuration settings, like the sequential page cost, and others takesinto account the statistics that exist for a given table, as well as indexes and everything else.And it comes up with a cost estimate for doing each operation. So, for example, it's looking atthe different operations here. How long would it take to do a hash, a sequential scan, a sort?And it comes up with a cost estimate for each one. And you'll see that there's two differentvalues listed here. So they say the first part of it is the estimated startup cost. So how long wouldit take to get to the point of doing this step? Like, does anything need to happen beforehand?And this one is one of the first things it can start doing is this sequential scan and also thissequential scan. So essentially its cost for startup is essentially zero. And then this is theestimated total cost for this action, the number of rows it anticipates as well as the amount ofdata that it has to work with. So basically you need to look and see where the greatest cost iscoming from when you're doing an Explain or an Explain Analyze to see what should be theslowest part. And in this section here, they talk about Explain Analyze, which actually runs thequery. So when you do Analyze, it actually runs the query, whereas when you just use Explain, itdoesn't run the query, it gives you an estimate. But I really like using Analyze, of course,because it actually runs a query and it gives you actual results so you can hear, you see theactual and it gives you the time. So how many milliseconds it took to run a particular action. Sothis is the actual cost estimate for doing this stage of it. And this is the cost estimate for thisstage of it and again, tells you how many rows and how many loops it has to go throughbecause there are certain types of queries where it has to do more than one loop. So forexample, it has to do this materialized step 100 times based upon the query and the data andhow it's laid out. So with this information, you can then look at your Explain Analyze plan andsee what is resulting in taking the longest time. And then that is the area you would need tooptimize. So if you want to learn more about Explain Analyze, definitely check out this secondpost from Higo CA.

[00:03:31] Now, a great compliment to the previous post is how to use Explain Analyze forplanning and optimizing query performance in PostgreSQL. And this is from blog. And so they actually take it from the perspective of a customer contactedthem and said that a particular query wasn't using an index, it was doing a sequential scan andwondered why that was. So they used Explain to actually figure out, okay, why it's not doing it.There was an existing index and basically just needed to rework the query in order to start usingthe index scan again. But then it goes into talking about that. Of course, the planner uses a costbased approach. And what's interesting about this post, it actually tells you how it derives thecost when just using Explain. And you can see the formula here, the number of blocks times thesequential page cost. Again, this is a PostgreSQL setting that you make. So it takes the numberof blocks that are going to be involved, multiplies it by that, adding it to the number of recordstimes the CPU Tuple cost. Again, that's another configuration variable. Plus the nu --mber of records times the CPU filter cost. Again, another configuration variable. And usingthese, it calculates the exact cost that's represented here for this sequential scan. So for thoseof you interested in details, definitely kind of shows you exactly how this cost based optimizercomes up with the cost that it comes up with. And then they talk a bit about statistics and howthey are involved in terms of doing the estimates as well. And then they talk about ExplainAnalyze and it actually runs the query. So you get actual timing for each step of the process thata query runs through. Then they have this summary table of all the different scan types and joinsthat are available. So again, if you're wanting to learn more about Explain Analyze and how itworks in PostgreSQL, definitely two blog posts to check out.

[00:05:25] The next piece of content is speeding up. Count all or count asterisks. Why not useMaxidminid? And this is from a Cybertechn, and they had done a previous post,Speeding Up Count Asterisks, and they said that in some of the comments, someone's saying,why don't you just do a max ID and min ID. But the problem is it will return the wrong answer.Now, I'd mentioned this on last week's episode of Scaling Postgres, talking about sequencesand how sequences can have gaps. So if you try to insert a row, have an error, insert a row, doa rollback, the next sequence you get will be higher than what you expect. It will have a gap in it.Sequences are just guaranteed to be higher than the previous one. It's not guaranteed not tohave gaps, so you're going to have gaps. So you can't use that to get an exact count and theygo through and give you an example of how you can insert something like they tried to insertwhere it would be an ID row of four, but they rolled it back and here you can see gap is presentbecause they used a rollback. Now this could be used depending on how well you know yourtable as a means to give an estimate, but it would be a very rough estimate and it depends onhow many errors you would tend to have in your table. So I wouldn't necessarily advocate it, butit could give you a rough ballpark sense of how many rows are in the table if you wanted to getan estimate as opposed to having to count every row. But then again, it may be easier to justuse the stats tables to get a rough estimate how many accounts that there are. So if you want tolearn more about this, definitely check out this blog post from CyberTech

[00:07:04] The next piece of content is actually a YouTube video and it's called My FavoritePostgreSQL Books. So this is from, I believe it was a webinar that was done on the EnterpriseDB YouTube channel and it goes over a set of books, each for a different type of person. Like ifyou're more of a developer, it suggests one book. If you're more of a beginner, it suggests onebook. Or if you are looking for a cookbook of sorts. So I think it's about four or five books thatthey mention and they kind of give you a summary of it and their opinions on each book. So ifyou're looking for more educational material about PostgreSQL, maybe check out this webinar.

[00:07:42] The next post is how to migrate from inheritancebased partitioning to declarativepartitioning in PostgreSQL. This is from and version PostgreSQL. Tenintroduced native partitioning or declarative partitioning versions. Prior used what they calledinheritance based partitioning. So this post walks you through the process of being able tomigrate from one to another. And I'll just go over the overview here. First they created a newtable, same columns as the parent table from the existing partitioning scheme. Then for eachchild table in the heritage scheme, they detached this child table from the parent using alter tablewith no inherit. And then attach this child table to the new parent using the attached partition,then create any needed new indexes on the new parent table. And they did mention in this postthat even if they exist on the child they won't be redone so that's definitely convenient. Sohopefully that would happen pretty quickly. Also you may need to add views, functions or otherobjects, drop the original table used with inheritance partitioning and then lastly rename the newparent table to replace the old one. And then they also mentioned when doing this you woulddefinitely want to do a backup of your tables and even doing it within a t --ransaction. So first they start off begin, then they do all the steps and then they conclude witha commit of course. So if you're needing to do this, definitely check out this post for a techniqueto migrate from inheritance base to declarative partitioning.

[00:09:18] The next post is using PostgreSQL for JSON storage. Now, this is a very brief post,relatively short, but it talks about using JSON in PostgreSQL and it gives an example of JSONthat say you have and storing it in a column that they just happen to call JSON underscorecontent and ideally with a gen index to make queries faster. And they show you how you canextract certain data from it. So this is a way to get the user's last name from this syntax and thenalso using the where clause to actually pull out a particular piece of content from it or a particularrecord from it. So again very brief but if you want a quick intro to using JSON, check out thisblog post. And this is from

[00:10:05] The next post again very brief is how to set up application name for PostgreSQLcommand line utility and this actually would apply for other applications as well. In this example ifyou try to set the application name using some variable with psql it doesn't work, it actuallyneeds to be part of the connection string. So here you see application name equals a specifiedname and then when you show application name it will appear and it says you can also use anenvironmental variable PG app name before making the connection. So if you have a particularapplication that you want to be defined as an application name within PostgreSQL which isconvenient for logging, you could use one of these two methods to do that. And this is from LFAsIO. The next piece of content is a tale of password authentication methods in PostgreSQL. Thisis from and it's walking through the evolution of passwords in postgres. Soat first it started off with just a password in clear text. They said this isn't great, let's try to encryptat least a little bit. So they used the crypt utility within Unix or Linux but again that haddisadvantages again because some clear text issues. So they moved to MD Five that resulted insome benefits, but there were also some negatives. So now the one that they suggest peopleusing is Scram and they talk about all the advantages and how it is superior to the previousmethods. So if you want to learn a little bit more about the evolution of passwords in postgresand the benefits of using Scram, definitely check out this blog post.

[00:11:42] The next piece of content is Oracle to PostgreSQL ANSI outer join syntax inPostgreSQL. This is from as well and there is some specific syntax thatOracle sometimes uses for doing joins and this basically explains how you would do that inPostgreSQL. So for example, this would be the Oracle version using this parenz plus sign,whereas you could just use the ANSI standard write join or write outer join syntax in PostgreSQLand how to do a full join. Again, you would use the full join syntax in PostgreSQL and even howyou would do a cross join. So if you have a need to migrate from Oracle to PostgreSQL andwant some migration techniques, check out this blog post from

[00:12:28] The next post is my favorite PostgreSQL, extensions, Part one. And my favoritePostgreSQL extensions, Part two. So the first one that they list is the postgres foreign datawrapper or postgres FDW. So this allows you to connect to external postgres servers and queryand get data from them. The second one they mentioned is Pgpartman. So if you're wanting topartition tables, pgpartman makes the management of those partitions much easier. So this isanother extension that you can use to do that in the next post. The first one mentioned is PGaudit. So if you're wanting to set up audit tables to track, inserts, deletes updates, et cetera, inyour PostgreSQL instance, you could check out this extension. The next one they mentioned isPgrepak. So this helps you vacuum and reclaim space from tables while the database is online.Because normally the only way to reclaim space is with a vacuum full, but that locks the wholetable. But this extension, Pgrepak, uses a technique so that you can do resizing of tables whilethey are online and reclaim bloat in your tables. And the last one they recommend is Hypopge.And this lets you set up hypothetical indexes to see how they could be adva --ntageous if you were to add them. What could be the benefit by doing an explain within there,so they help you get a sense of would particular indexes help with particular queries? So if youare wanting to learn more details about these extensions and how to use them, you can checkout either of these two blog posts from several

[00:14:05] The next piece of content is how the Citus distributed Query executor adapts to yourpostgres workload. This is from Now they're talking about Citis, which is anextension for postgres that helps you shard across multiple postgres servers. And they're talkingabout the open source version so this is included in it. Now, I'm not that familiar with using Citus,I've never used it. But I like this blog post because it kind of gives some insight into potentiallywhere Postgres is going with its sharding and talking about some of the issues they were dealingwith and some of the use cases and how they were adapting their solution to handle these usecases better. So if you're wanting more insight into kind of Sharding and kind of potentially wherethe future will lead, this is a very interesting blog post to check out. The next piece of content isthe PostgreSQL Person of the Week is Melanie Plageman. So if you're interested in learningmore about Melanie and her contributions to Postgres, definitely check out this blog post.

[00:15:07] The next piece of content is a deep dive into PostGIS Nearest Neighbor search. Thisis from and it is a deep dive talking about Nearest Neighbor search usingPostGIS. So if you have interested in learning more about that, check out this blog post. Thenext post is intersecting tracks of individuals mobilitydb. This is from CyberTech Postgresql.comand this is following up on a post kind of talking about contact tracing that's been discussedrecently. And this post talks about setting up data structures within PostgreSQL so that they aremobility DB enabled. This is an extension for PostgreSQL to make some of this easier. It's builton top of PostGIS they mentioned here set up trips based on initial mobile points and thendiscovering the intersection of infected individual to retrieve possible contacts. So if you want togo more in depth on this, definitely check out this blog post.

[00:16:07] The last piece of content is how to use Tree machine learning model with two UDAPostgreSQL and orange. Part Four so this is part four of the machine learning posts fromSecond

[00:16:21] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show notes. Be sure to head over to where you can signup to receive weekly notifications of each episode. Or you could subscribe via YouTube oritunes. Thanks, our. --