Indexing Deep Dive | Scaling Postgres 285
Join Over 1,000 Engineers & Get New Episodes Weekly!
In this episode of Scaling Postgres, we cover a deep dive into indexes from a presentation that includes a decision tree of sorts, how to convert to partitioned tables once you have hundreds of millions of rows and detail about the new pg_stat_io view.
- Real-Life Database Indexing
- Dynamic Table Partitioning in Postgres
- pg_stat_io and PostgreSQL 16 performance
- PGSQL Phriday #013
- Oracle Supports Postgres
- Five Great Features of the PostgreSQL Partition Manager
- Postgres schema changes are still a PITA
- Introducing pgroll: zero-downtime, reversible, schema migrations for Postgres
- Migrate from Sybase to PostgreSQL
- Huge Pages and Postgres in Containers
- Version History and Lifecycle Policies for Postgres Tables
- Scaling GraphQL with Postgres - Lessons learned from our database timeout issues
- How to write high-performance SQL for your Postgres database
- Vacuum in PostgreSQL
- Allowing DML Operations in Highly Compressed Time-Series Data in PostgreSQL
- Anatomy of a Postgres extension written in Rust: pgmq
- TIL - When was my Postgres cluster initialized ?
- Postgres I/O Basics, and how to efficiently pack table pages
- pgenv version 1.3.3 released
- CipherDoc: A Searchable, Encrypted JSON Document Service on Postgres
- PostgreSQL 16 Feature : Sub-Query Aliases and its usage in Code Conversion.
- Railroad Diagrams
- How to set up a PostgreSQL cluster using pg_cirrus on AWS
- Vector Databases Explained
- The Story Behind the Activity Book for Postgres
Do you have an index decision tree? Well, we check one out in this episode of Scaling Postgres. But before we get into that, I hope you, your friends, family and coworkers continue to do well. The first piece of content we're going to cover is "Look It Up: Real-Life Database Indexing". This is a presentation that was given by Christophe Pettus and this presentation is on thebuild.com. This is a great presentation, talking all about indexes. So going over all the different index types again, the B-Tree, Hash, GiST, GIN, SP-GiST, BRIN, Bloom, et cetera. He describes each of them and how they basically work. Then as well as covering things like, you know, you need to have good statistics for them to be accurate.
A little bit about index-only scans, velocity scans, covering indexes, and things of that nature. But what I thought was pretty cool here is the part where he's kind of doing this decision tree. So is the column a small scalar? He asks a series of questions and based upon that he gives an index suggestion. So for this case, it's pretty much B-Tree. But there are certain cases where a BRIN could work well in terms of text fields looking at B-Tree. But there are also certain conditions that might warrant an SP-GiST, or maybe even a GIN which he's talking about down here. Is it a range or a geometric type? Then it's GiST. If it's a JSONB column and only equality, use a Hash. Otherwise, use GIN if you're interested in the key values. So, a lot of great information in this presentation. So definitely a piece of content I suggest you check out
Next piece of content- "Dynamic Table Partitioning in Postgres". This is from supabase.com and as I was reading over this post, I got a little bit frustrated by it. I mean, it is my number two post, which means it has good content. The frustration was I would have written it basically upside down. The things that were at the top, I'd put it at the bottom and the things that are at the bottom, I'd put it at the top. So, looking at the example, here is Why Partition Data. They first started off with basically performance reasons. SELECTs get slower as the index grows, INSERTs get slower as an index grows, whereas I think the number one reason to do it is the bottom reason. So "Archiving older data is easily done in well defined partitions". So if you want to throw away data, partitions are a great use case for that.
"Large tables mean more data to move around", particularly with maintenance operations, so that's definitely a reason to do it. "Large tables prevent vacuum operations from completing". That can be the case, I've seen tables with trillions of rows that complete auto vacuum. It just depends on how frequently that data is changing and how that table is being used. "Large tables mix unrelated data". So partitioning could put them together. For example, maybe if we had all one account's data in a particular partition, that might help. But a lot of these other performance reasons, I mean, maybe you'll see a 10 or 20% performance increase. It kind of depends on the table and how it's structured in the indexes, but I've never seen it be a significant improvement. Therefore, that's why I frequently say performance isn't necessarily the reason to do partitioning because I've never seen it make a big huge difference, or at least I haven't yet. But in terms of partitioning, what this is describing is a way of moving two partitions once you have all of this data in a table or two.
This is a use case that they apparently helped someone address where they had a messaging app with hundreds of millions of rows and they wanted to convert to a partitioning scheme. Now again, they start off and this is an example of it, they have a chats table and then they have a chat messages table. This is pretty much the content that's in it. They inserted a bunch of data to simulate what it would look like. And then next they create the parent tables. So these are the tables that will be the parents of the partitions. So they went ahead and created those and this is where it gets a little bit upside down for me. So they started talking about creating the dynamic child tables and it felt a little bit about them introducing a brick at a time without telling you kind of what they're building. So I actually found it easier to go to the bottom here where they say, okay, we want to create a procedure that loads all the partitions for the chat table and then loads all the partitions for the chat messages table.
Okay, so they didn't show the messages table in the rest of the blog. That's on the full code example that they included, but they focused on loading the chat's partition table. So what does that look like? Here's that function, load_chat_partitions, and it basically gets the minimum date from the original table, the max date from the original table, and then does a generating series to create each partition, a partition per day. Okay, so what does that function do? That's right here, they actually create the partition itself. They copy the data in, then they apply the indexes and attach that partition to the parent tables. So I found it much easier to read it through going from bottom up. And then if you want to know what each of these functions is, you can go and look up here. So this is where they create the chat partitions and they're basically creating a table like the parent table.
Then they copy the data using INSERT into the partition table from the parent table. Then this procedure adds CONSTRAINTS, INDEXES, and then attaches the partition as well as the index and then drops the CONSTRAINT that was added. So it doesn't do a CONSTRAINT check when you attach the partition. So I thought the content was good in this post, but I really had a hard time following what was going on until I looked at it in an upside-down fashion. But there is one thing that confused me when I was looking at this because they're partitioned by created_at yet in the parent table, and of course all the partition tables, they're indexing on ID first and then created_at. I didn't think you could get partition exclusion if the partition key wasn't the first column in the index or the primary key. So I normally always put the partition key as the very first column, but I haven't really tried this out yet. If you know in the comments that oh, yeah, this works without a problem, let me know. But check this post out if you want to learn more.
Next piece of content- "PG_STAT,_IO AND POSTGRESQL 16 PERFORMANCE". This is from cybertec-postgresql.com. This is a post all about the new pg_stat_io system view. It describes each of the columns, what kind of values you can typically see in there, and what it does. Then he gives a couple of quick examples where he's looking at a few different back-end types like auto vacuum workers or client back ends. So if you want to learn more about it, you can check this out.
Next piece of content- "PGSQL Phriday #013" is coming up, and by Friday, October 6, you're supposed to have published a blog post about different use cases for Postgres and why you're using it. Like they say, did you do things like use it for full-text search or JSONB PostGIS, et cetera? And does using it allow you to remove other dependencies? I know I'll say I pretty much use it for everything; I use it for my queue, I use it for full-text searching, pretty much everything. But feel free to participate in this if you're interested.
Next piece of content- "Oracle Support Postgres". This is from momjian.us, and he's talking about Oracle basically supporting Postgres and I guess having a Postgres service as well that they're providing. So that's pretty crazy how the tides have turned essentially since I started using Postgres and migrated from Oracle so many years ago. But check out this blog post if you're interested.
Next piece of content- "Five Great Features of the PostgreSQL Partition Manager". This is from crunchydata.com, and I'll have to admit that I've never used pg_partman. I've used partition tables for over ten years in different applications. And I've always just had scripts that create and remove partitions because I can flexibly create whatever type of scheme I want. At the time I would do it with, say, functions and then a bash script to kick off those functions run by Cron. More recently, I may actually write it using my application framework language again kicked off by Cron and it would just send the required data to create partition tables or remove partition tables. So I've never really had a need to use pg_partman. Now he says these are five reasons to use it. One, he says retention. I don't know if I would call that a reason to use this. I think that's a reason to use partition tables. It's basically if you want to remove data periodically, you don't want to send deletes to a really huge table. That's the advantage of partitions. I don't know if it's a reason to use pg_partman.
The next one, he says, is a background worker in that you can set up a background worker to run pg_partman and it can run on an interval basis. But he says it is possible to drift. So that's why I would personally like to stick with a Cron job. And he says, but there are other ways you can schedule the job, of course. Talks about additional constraint exclusion, so I actually haven't heard of this, but this seems an interesting use case so that you can do additional exclusions other than just what's in the shard key. I don't know how this would work with the planner. Again, I haven't really used this feature, but it sounds interesting. Although personally, I might want to make that a separate extension. Next is epoch partitioning, but I think you can do this on your own if you're scripting something. And then the last feature is a template table, which he says is actually a feature I hope will eventually disappear. So apparently that's not a reason to use pg_partman. But I don't want to dis pg_partman too much. I know a lot of people use it and they find it great. I just know I haven't had a compelling enough reason to even investigate it. But if you're interested in it, you can check out this blog post.
Next piece of content- "Postgres schema changes are still a PITA". This is from xata.io and he's talking about the cautiousness you must exercise when you're doing schema changes. And some of the gotchas that can get you are locking issues. But of course, the number one way to avoid this is to set a lock timeout for the session as you're doing the schema changes. That way if anything gets blocked by a lock for whatever duration you set, it'll go ahead and cancel that migration and you can try it again later or find out why it is locked, resolve that issue and then run your migration again. But in my experience, this solves 90% of the problems. The other thing they mentioned is being able to rename and they say it takes six stages to do that. To which I would say, then don't rename your column. Or if you want to rename it, wait for an upgrade.
Like if you're upgrading to a point release, go ahead and throw the rename in there. If you can afford even the smidgen of downtime because renaming shouldn't happen very fast. Because the alternative they describe here is pretty onerous and it's coming from the PlanetScale docs, but it says to create a new column with a new name. Update and deploy the application to write data to both columns. Backfill missing data from the old column to the new column, add constraints like NOT NULL, update the application to use the new column, remove any references to the old column name, then lastly drop the old column. So that's pretty onerous and I would frankly just like to rename the column during some sort of downtime or just leave the column name until a time comes up that you can take that downtime. And then they talk about needing to have rollbacks, but this actually leads to them creating an extension called pgroll.
So this blog post "Introducing pgroll: zero-downtime, reversible schema migrations for Postgres". They talk about the reasons why schema migrations are painful, a lot of what was discussed in the previous post and they describe how it works. Basically, it looks like they're putting everything through views. So the transition happens using views. As they're migrating the schema, the current viewpoints to the physical table, changes are made there, then a new view is created, the application starts using it, and then the physical schema is resolved such that it only supports the new view.
So that's definitely a way to do it, to get a seamless migration. But personally, I'm not really a fan of having these additional views in my schema because it brings questions to me like are they always there? Are they there temporarily? Do I never really talk to my table directly from the application or at the time of migration, does it create a view for this purpose and then change the name of the table and then swap the names with the physical table? It just has a lot of questions. And frankly, I'm happy with my current applications framework for doing migrations, so I don't necessarily see a need for this for what I'm using. But if these are pain points you're experiencing, then maybe you want to check out this extension.
Next piece of content- "MIGRATE FROM SYBASE TO POSTGRESQL". This is from cybertech-postgresql.com and apparently, finally, as they say here, quote "Sybase ASE is dead". Apparently by 2025. So basically if people are still using Sybase, it's time to migrate off. This is crazy because I actually got my start doing database administration at an enterprise level with Microsoft SQL Server, which of course Sybase is the granddaddy of that. But this post talks about how you could migrate from Sybase to PostgreSQL. The first way they talk about doing it is by using the TDs Foreign Data Wrapper and TDs stands for, as they say here, Tabular Data Stream, which is basically the protocol between Microsoft SQL Servers and I guess Sybase as well. The second option is moving to Babelfish, which makes Postgres essentially wire-compatible with a Microsoft SQL Server, where apparently it can be compatible with Sybase as well. Third option they're saying hey, we're also doing a CYBERTEC Migrator, so we'll have to see what that looks like. But check out this blog post if you're interested.
Next piece of content- "Huge Pages and Postgres in Containers". This is from crunchydata.com. Apparently, they've made some changes, so now you can properly configure Huge Pages in Postgres containers or basically running Postgres and Kubernetes. And they had an issue that was recently resolved and they even had to go upstream to resolve the issue, actually resolving it in an OCI Runtime Specification. So with this recently released specification, you can now properly use Huge Pages in containerized Postgres. So if you're interested in that, you can check out this blog post.
Next piece of content- "Version history and Lifecycle Policies for Postgres Tables". This is from tembo.io. So in this example, they're talking about having an employee table where they're using an extension called temporary_tables that enables you to define what is current and what is historical. So they have an employees table that has the current data using a sys_period as a range type to define when this value was started and when it ends, essentially, in the future; infinity. And then an employee history table that shows the same range as when a particular salary was active. So they show how to set that up and how it works. And then they merged it with partition tables so that you could easily drop old history tables as you need. But check out this blog post if you're interested in that.
Next piece of content- "Scaling GraphQL with Postgres- Lessons learned from our database timeout issues". This is from cycle.app and these are more on the basic side of issues they resolved. The first problem they had was that they had constant connections from their clients because they were using WebSockets, but whenever they deployed their application, all those connections got reset and it actually had to reestablish all of the database connections as well. So they basically changed their application so that was no longer necessary and the database wouldn't get pummeled in that case. The second one, they had a big query problem that was basically resolved by adding some indexes. Then the third problem was dealing with a lot of big updates happening simultaneously. So basically they used a queuing system to resolve that. So if you want to learn more details about it, check out this blog post.
Next piece of content- "How to write high-performance SQL for your Postgres database". This is from the stackoverflow.blog and this is actually from their Stack Overflow Podcast. And Lukas from PgAnalyze was actually interviewed. So you can check out this blog post if you're interested in that.
Next piece of content- "Vacuum in PostgreSQL". This is from stormatics.tech and this is another post about vacuum, so feel free to peruse it for that reason. What I found most beneficial was the last part where they had different questions like how do you make vacuum more aggressive? And they showed different parameters that you can change. I will note- be careful of thinking that increasing max_workers will actually make it vacuum faster. It won't. It might actually slow it down. What makes it move faster is being able to do more work in a given unit of time. And that's what this autovacuum_vacuum_cost_limit does. So the higher you increase that, the more work can be done. And just be aware, all that work that can be done gets divided amongst the workers. So the more workers you have, the less work that can get done per unit of time.
Whereas the fewer workers, the more work that could get done on the particular tables that are being vacuumed at that time. But they also mentioned other things like how to monitor for transaction ID wraparound in the database and tables, and what to do if vacuums are not cleaning up dead rows. I'm talking about long-running transactions or using hot_standby_feedback, Abandon Prepare transactions, and things of that nature, and following up with some best practices. So check this blog post if you're interested in that.
And the last piece of content is "Allowing DML Operations in Highly Compressed Time-Series Data in PostgreSQL". This is from timescale.com they've been able to do this. It was announced a number of months ago. I believe that with their column store capabilities, they get really high compression up to 95% compression. You can still do INSERTS, UPDATES, and DELETES, whereas I think that's not always possible with other column stores. But if you're interested in that, you can check out this blog post.