Subquery Performance, Avoid Update Locking, Column Changes, Outage Workshop | Scaling Postgres 274

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss subquery performance, how to avoid excessive locking when doing updates, how to change a columns datatype without excessive locking and lessons from an upgrade outage.

Content Discussed

YouTube Video

Podcast Audio


All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "SUBQUERIES AND PERFORMANCE IN POSTGRESQL". This is from So the first thing they do is explain what a subquery is. Basically, it's any time that you have a statement within parentheses. So for example here you have a subquery that's part of this statement here. And there are both correlated subqueries and uncorrelated subqueries. A correlated subquery means it references something from the outside of it. So this is referencing the "a" table within this subquery. So it's referencing the table that exists outside of the subquery. An uncorrelated has no connection to the outside. It's basically a self-contained query.

And generally, those are best and they say here quote "Uncorrelated subqueries are almost never a performance problem". Actually, other than in subqueries, which I'll get to in a second, these are the only subqueries I ever write, the uncorrelated ones. Unless I really need to do it for some reason. And generally, that's because this can easily become a nested loop or sometimes it's forced to if you're doing what they are calling a scalar subquery here. I found looking at two of these, these queries just rubbed me the wrong way. I usually never write things like this. I usually write them using joins. So I would join the a and the b table where those values existed and bring those two columns up. So this was quite an odd way for me to see it.

Actually, a way to not use correlated subqueries, which can be a problem, is to convert them into joins. So they show that here in this next section on how to resolve it. Basically, doing that join exactly as I just described. And then you can even do the scalar subquery and it's a scalar because it just returns one value. This is just returning the count. You can replicate that. by using the GROUP BY and HAVING clause will get you the same answer. Now in addition to scalar subqueries, there are also tabular subqueries that return a whole block of data. And whereas scalers typically cause more problems, tabular kind of depends. So there are different ways to write things but they say CTEs are never correlated so they are never problematic.

However, things, like a lateral joint, can be correlated so a way to rewrite that is using this joint technique here and using DISTINCT ON. Then they talk about EXISTS, NOT EXISTS, IN, and NOT IN. They say Postgres can process EXISTS and NOT EXISTS relatively efficiently. It can process IN relatively efficiently but not NOT IN. And generally, when I see a NOT IN, that's usually a performance problem. Generally, how I tend to deal with that is to do a Left Outer Join to a table looking for where that value is null. That's usually much more efficient. I haven't tried EXISTS in that use case yet though. And then they follow up with where maybe you actually do want to force a nested loop join and with that, you can use correlated subqueries to do that.

But I highly suggest checking out this blog post as well as this week's "Five minutes of Postgres" on the post "How to optimize correlated subqueries in Postgres" by So they cover this particular post as well.

Next piece of content- "Prevent Locking Issues For Updates On Counter". This is from And this was an interesting technique I hadn't seen before. I don't know how performant it would be. It feels like it gives you a little bit more headroom. But what it's doing is, say you have a scenario where you are counting page views to a website and you have the page views table. You update it by one every time there's a page view. Well, locks are going to really hinder your ability to keep this up to date. So you're going to have a bunch of lock queues if you have tons of people visiting this particular page. You're also probably going to run into vacuum issues in terms of Postgres. So this is no bueno, this is no good.

But their solution for it is to create a page views counts table with the URL you're using the count but then have a fan out column. And of course you add an index, but what the fan out does whenever a new view comes in, you INSERT, or ON CONFLICT update. They show the MySQL version, I guess, but they have the ON CONFLICT update up here. What you put in the fan out is a random number from one to 100. So every count coming gets distributed across 100 rows, as opposed to just updating the same row. So that's why I'm saying I haven't seen this technique before. It's very interesting and it would give you a little bit more headroom. But I don't know how far you could push it.

I mean, I'm sure you could try 1,000 rows, maybe about 10,000 rows. But how better would you make it? Now he also talks about potentially using additional techniques like FOR UPDATE SKIP LOCKED when doing the update. But at the end, he has this disclaimer in that quote "You don't have to use your main database for everything". This particular use case is tailor-made for something like Redis, where it has an increment counter. So if you're really pushing things far, maybe Postgres isn't the best place to store this highly updatable data. But I found this technique pretty interesting and feel free to check it out.

Next piece of content- "Online Data Type Change in PostgreSQL". This is from And they're talking about a situation where you need to change column type, in this case, an INT to a BIGINT, but you don't want to incur downtime. Because if you try to change that data type, it's basically going to lock the table and rewrite that value. But this has a process where you can go through it and not incur that downtime by creating a new column, writing data values to both columns, backfilling the necessary data in the new column so it now matches what the old column was, and then in one transaction, switching the names so it's now correct. So the first step is to add the new column, they're calling it order_id_temp.

Now what's interesting, they use this operating system code here. Frankly, I would probably just use a pg_lock_timeout for the session of the database. So we usually just set a lock time out for one second or, as they say, a hundred milliseconds. And then run this command and if it runs into too many locks or it starts to lock you, it'll stop it for you. So I'm not quite sure why they added this here. Next, create a function that basically writes the same thing you're writing from the order_id to the order_id_temp. You put that into a TRIGGER that gets fired on INSERT or UPDATE of the table. Then they created a separate table for all of the IDs needed in the backfill.

And they wrote all the IDs to that temporary table, put an index on it, and then they have this update process here. For 5,000 rows at a time, it selects 5,000 rows from that temporary table, updates the new data column, and then deletes from that temporary ID table. Then this is the transaction that does the switch. They locked the table and shared ROW EXCLUSIVE MODE, dropped the TRIGGER, and then changed the column names. And that should give you a new BIGINT column without any downtime. Or with, as they say, here less than one second of locking. But check this blog post out if you want to learn more.

Next piece of content- "Let's Workshop an Unplanned Postgres Outage". This is from And this was an upgrade issue that happened at RevenueCat back in November of 2022. Actually, I didn't see that I covered this in Scaling Postgres so I'm going to have to take some time to take a look at it. But they discussed what went right. So they appreciated all the planning that they did for doing the transition. And they were going from Aurora version 10 to Aurora version 14. They also appreciated the fact they brought up a parallel cluster. So they had their existing cluster here on PG 10 or Aurora 10. And they created a logical replica of it in PG 14. So it was an entirely separate cluster, which is ideally how you would want to do this to avoid downtime. Because you can do all sorts of tests and make sure that things are correct.

The other thing they appreciated that they did is they did some cache warming. They tried to warm up the cache in the new cluster before they switched over to it. But of course, they had some issues. So what went wrong? The first issue was that they were missing analyzing the new cluster. So as soon as they started putting queries against it, the statistics weren't up to date and queries were taking forever. So that was not good. And next they didn't copy the sequences over from version 10 to version 14. So their sequences, I guess, were stuck at one. So if you have a billion rows in the table, you would have to churn through all of those to get to that one billion and one row before inserts are successful again. So that's definitely not good either.

And they talked about in the case of ANALYZE, you would basically want to analyze ahead of time or just set up your workers in a sufficient cost limit so everything is vacuumed in the new cluster and the statistics are up to date. And in terms of copying the sequences, there's no really great way to do it but they have a copy command here where they advise shutting down writes and then copying all the sequences over at that transition point rather than trying to do what while writes are happening in the current cluster. And the other suggestion that they made is to avoid waiting for versions to do the upgrade. Maybe, try to keep more up to date with each version or maybe even every second version but four versions are getting a little long. But if you want to learn more about this, check out this blog post.

Next piece of content- "PGSQL Phriday #010: Log analysis". This is from And this is the PGSQL Phriday for this month and it's on pgBadger. But this person actually doesn't use pgBadger, but she took it as an opportunity to talk all about the log settings that she likes to set in her Postgres instances. And then she talked about all the log analysis that she actually does.

Next post also related to PGSQL Phriday is a pgBadger at And he talks about some of the good and the improvements that could be made to pgBadger. So if you want to learn more about that, you can check out this blog post.

Next piece of content- "pgvector 0.4.0 performance". This is from This is one of those performance things that you have to take with a grain of salt because it's a little bit of one organization having one perspective and one organization having another. So there was a comparison done by NirantK. So if you look at this post here "Quadrant vs. pgvector- Results from the 1M AI Benchmark". And they say for pgvector in PostgreSQL quote "There are good reasons why this option is strictly inferior to dedicated vector search engines, such as Qdrant". And they have this query speed demonstration where pgvector is much slower. Whereas supabase went through the process and tried to replicate the benchmarks. And once they made some adjustments, their results showed pgvector being faster. So I will leave it up to you to check out these blog posts. And do your own analysis if you were working with AI to determine what you would choose.

Next piece of content- "Compressing (Not So) Immutable Data: How We're Changing Time-Series Data Management". This is from This blog post talks about how traditionally, you would be ingesting time series data, and you would have uncompressed recent data for doing analysis. But then at a certain point, you would generally archive and compress it and put it away for archival storage. And you could query but generally, you would not change it. Because of the expense related to uncompressing it and changing it and then re-compressing it.

But there's apparently a new feature or new capability where they have an innovation that is discussed here quote "The compression is achieved in a buffer of uncompressed data and written latently to disk in a compressed block". So apparently, when an update needs to happen, they're taking the compressed data, uncompressing it, modifying it, and then writing it latently in a compressed format. So basically, the right to the compressor block is not in real-time. So it's basically a background process and they say, quote "'s virtually invisible to the requester". So this is pretty interesting if you work with time series data and if you want to learn more, check out this blog post.

Next piece of content- "Understanding RDS Cost". This is from And if you use RDS, this is a good blog post talking about all the different RDS costs and how you could potentially better manage them to reduce your spend.

Next piece of content- "PostgreSQL14 Internals for print on demand". This is from And this is the PostgreSQL 14 Internals book. This is freely available via PDF but now you can order a hardcover or a paperback version. So check that out if you want.

Next piece of content- "Ccache and PostgreSQL build directories". This is from If you want to learn more about building Postgres with a new Meson build system, you can check out this blog post.

Next piece of content- "Announcing Crunchy Postgres for Kubernetes 5.4". This is from So if you want to check out their new operator for Kubernetes that covers ARM support, vector search via pgvector, support for huge pages, native support for Postgres table spaces, and documentation enhancements, you can check out this new version.

Next piece of content- "POSTGIS, ARCGIS ENTERPRISE AND THE TOUR DE FRANCE ROUTE". This is from And if you are interested in GIS content, you can definitely check out this blog post.

Next piece of content- "MapScaping Podcast: Pg_EventServ". This is from And he's talking about two podcasts that he was on, talking about whether or not to put rasters in their relational database and also listen and notify clients in real-time. So if you're interested in that, you can check out that blog post.

And the last piece of content, there was another episode of Postgres F M last week. This is on "Anniversary mailbag". So apparently the podcast got started a year ago so congratulations. And for their one-year episode, they covered a fair number of questions that they received. So if you want to listen to the podcast, you can click the button here or check out their YouTube channel.