Move Off Of RDS? | Scaling Postgres 369

In this episode of Scaling Postgres, we discuss why and how you could move off of AWS RDS, Snowflake acquires Crunchy Data, reducing your SQL queries and a novel way to migrate to partition tables.

Content Discussed

YouTube Video

Podcast Audio

Transcript

A couple of years ago, David Heinemeier Hansson, who created Ruby on Rails, talked about his company Basecamp was going to leave the cloud. And basically what he meant was he was going to no longer pay for a lot of services on AWS. They were going to be running their servers on bare metal as a way to reduce costs. Now, they didn't really leave the cloud, they were still on the Internet. They they just were running their own servers at a different hosting provider. And I know what I've done to try to minimize cloud costs is mostly just use the commodity services of AWS like EC2, EBS, S3. But a lot of the more specialized services I've avoided because I know they tend to put a pretty high markup on those. And one of those services that I don't use is RDS. Why? Well, I already have sufficient postgres experience. I don't need it to automatically be managed for me. And what I hear more than anything else from different clients that I talk to is that if they aren't using a hosted service, that's their plan. Basically, they want to move into a cloud managed database service. But we're going to be talking about a recent episode of postgres FM where at least Nick thinks we may be entering a tipping point where people may start to migrate out of it. Also, be sure to stay till the end of the show where I have my Consulting Corner segment where I talk about what I think is a novel way to migrate to partition tables using minimal downtime. But I hope you, your friends, family and co-workers continue to do well.

And the first piece of content is there was another episode of postgres FM last week and it was “How to move off RDS”. So that's definitely a loaded title for me because like I mentioned, all of the clients I talked to are actually trying to get onto RDS or some other hosting platform and I haven't heard of anyone looking to get off of it yet. And if there's one trend I've noticed with developers, they don't want to manage their own servers, they don't want to manage their database, they just tend to prefer to code new features. But in this episode they talked about why you would want to do this. The number one reason I mentioned is costs, because those costs can be a substantial premium to having your database managed for them. The next is control. You have less control over things. You're essentially locked into a vendor. Now, there are ways to move off of it that they of course discuss in this episode. But another Sign of control is that you can't install any extension you want. You only can install approved extensions. Now as I was listening to this show, I was thinking, you know, once the Kubernetes operators become really mature, then I would potentially see this happening more often because there once you can trust an operator enough to manage the whole thing for you and there's very minimal issues that may entice someone to go ahead and move off of their hosted service provider. Now in terms of how to move off, basically you use logical replication and I actually used the exact same process he's mentioning to to move into RDS, but you can use the same process to move out of RDS or any other managed database provider. So you just set up a publisher and set up a subscriber to transfer the data and then pick a cutover point to start using the new database. I mean, you have to do some other things like be sure you analyze the destination and transfer the sequences and make sure DDL is either kept up to date or halted during that migration process, but it is doable. Now the problem comes of course when you have a multi terabyte database, but they give recommendations on how to handle that. And I do the exact same thing that they mentioned. Basically you can set up multiple publishers and subscribers to make sure that it can handle the load because sometimes that one apply worker at the destination server is insufficient for the traffic coming in, especially to try and catch up from the replication being set up. And Nick mentioned that the biggest problem is with the initial data copy because again you have say 10 terabytes to transfer over. And there you can use a solution of creating a slot, opening a transaction and then export a snapshot. And then from that snapshot ID you can use say PG dumpand dump things in parallel using that snapshot id, either using the jobs command or using separate PG dumpdump jobs. Or you can even use copy. I've actually used this process with copy to copy a very large table in parallel. And they exactly mentioned that in the show as well. Now this does get complicated. You don't want a foreign key constraint to cross over the different replication slots because you may have things being inserted in a different order and have replication conflicts. And Michael actually recommended, gee, wouldn't it be better if you could just plan some downtime so you don't have to incur the complexity of trying to transfer this data. They also mentioned there may be some commercial tools that might help like Fivetran or even an open source tool like PGCopyDB but definitely another final recommendation they mention is to have pg_bouncer or some other pooler running for your application because then you can pause traffic, redirect the traffic to the new database instance, and then resume traffic. So you can minimize downtime as much as possible at the point that you cut over to the new solution. But if you want to learn more, definitely encourage you to listen to the episode, watch the YouTube video down here.

Next Piece of content “Crunchy Data Joins Snowflake”, this is from CrunchyData.com and I did not have this on my bingo card this year, but given Neon was just acquired, maybe this is a trend for the industry right now. And I think why Snowflake did it is because to my understanding, I've never used them, but they're all about AI in the data warehousing and they're looking at this acquisition as a way to get a postgres operational data store hosting provider. So they're strong in the data warehousing analytics field. Now this gives them a transaction hosting solution. So you can definitely read this post to learn more. And there's also an announcement on the Snowflake blog as well. I'll include in the show notes talking about the reasons why they did it. But if you want to learn more, definitely check out those two pieces of content.

Next Piece of Content “Tip: Put your Rails app on a SQL Query diet”, this is from andyatkinson.com and I think another title for this post could be the best SQL is the SQL you don't have to run. So this is basically talking about in your application running too many unnecessary queries. So one way to guard against that is to actually count the number of queries that are happening for a given endpoint in your application that's hit. Now he said he used to recommend a library known as a Rubygem called Query Count, but as of Rails 7.2 it's built into rails. So it tells you how many times a particular query was run for a given endpoint that's hit. Or it's known as an action in Rails. So you can use these to get a sense. You can also take a look at the development log and you can see multiple queries being called to know that you should refactor this to reduce the number of queries against your database. But even though this post is for Ruby on Rails, this is applicable to every other application framework out there. So you really need to be aware of how many times you're running queries against your database system.

Next piece of content “Bridged Indexes in OrioleDB: architecture, internals & everyday use?” This is from Orioledb.com and again, this is an extension that tries to revamp how postgres works and move from storing version history in the heap itself to an independent undo log. And actually, in the process of this redesign, where the data is stored is not in a heap, they actually store the data in an index format. So the data is actually stored by order of the primary key. So that allows primary keys to be very fast because the index order is the same as the table order, or at least that's a layman explanation of it. And you can see that the undo log is separate here in this graph on the left. But because of this, historically this extension only supports B tree indexes, so it did not support gin, Gist, Brin, all the other index types. Now, one solution is to rewrite all of them to use this new data format. However, to get around that, they're actually using a bridge index. So they create a number of columns in particular tables and a bridge index so that these Gin, Gist, Brin, all the other indexes can be built against it to be able to access the data. So, for example, they show on the left here a gist index. This actually points to an area in the bridge index because it more so mimics, I guess, the heap structure. And then from there it does another hop to actually look up the data, I guess via the primary key. So they've tried to make this as simple as possible to work with. It implicitly adds columns to handle this to your tables. Now, of course, the disadvantage is that it does incur an extra hop when doing a lookup. So with GIST and GIN indexes, you're going to get a little worse performance and updates pay twice. So that updates the index am as well as the bridge index. But I'm guessing that they're hoping that the overall performance improvements of Oriole DB over stock postgres will make this a negligible or acceptable performance hit. But I think this is truly intended to just be a bridge. And eventually, if this storage format takes off, these other index types could be adapted to run directly against the data store for it. But if you want to learn more, check out this blog post.

Next piece of content “Building on open table formats”. This is from Supabase.com and this is similar to a post that was done last week on Supabase where they're talking about the open standards. And it's really interesting how they're approaching this basically using these open table formats like Apache Iceberg and then incorporating that into S3 storage. As you can see, they're talking about S3 and iceberg here as a way for accessing all sorts of collected data in this environment. Do you need ETL? So it's interesting, but I wonder where Supabase is going with this. Like what do they have planned? But if you want to learn more, you can check it out.

Next piece of content “Approximate the p99 of a query with pg_stat_statements”. This is from pgmuster.com and they're talking about how it's very hard to get the P99 just by looking at PGStatStatements. I mean, all it gives you is a mean, but it does give you a standard deviation which can help. And Michael actually came up with a query to estimate the P99. Now he said why you can use this. He said it's probably better to get P99 or P95 metrics from your application performance monitoring tool or your rum, your real user monitoring tools. Or maybe on the database side there is pg_stat_monitor extension. So that may be a better way to capture some of this stuff. But if you want to learn more, check out this blog post.

Next piece of content “Postgres Partitioning Best Practices”, this is from karenjex.blogspot.com and this is converted from a PyCon Italia talk just about a week ago under the same name, postgres Partitioning Best Practices and she talked all about doing that. So she talked about advantages of partitioning basically for easier data maintenance, especially if you're deleting data. It can have some advantages with performance, but that's usually not the reason to do it. When to do it is basically when your tables become large. How you use partition keys using PGPartman, the design for partitioning as well as how you can convert the partitioning of existing tables. But this was a really long blog post. I don't have time to cover everything here, but if you're interested you can definitely check it out.

Next piece of content “Yes, Postgres can do session vars - but should you use them?”. This is from kmoppel.github.io and if you want to store temporary state a lot of people he says here use temporary tables, but it can be a problem if you're using them a lot. Like you're constantly creating them and destroying them because you could cause catalog bloat which can cause other types of performance problems. And he says instead of doing that, maybe think about using unlogged tables as a temporary store because those stick around because the temporary tables only exist for a session, whereas unlogged tables exist across sessions, but they still don't write to the wall, giving you a lot of great performance. But if you actually wanted to try a memory storage technique, you can use session level vars, and he shows different ways you can set them and and retrieve them. Now, I don't think I'm necessarily going to be using this, but he says some of the advantages are there's no IO performed, there's no catalog churn. It works even when you don't have the ability to create temp tables. You can store crazy long strings for free, although there's definitely memory limits. And it also works on replicas. But if you're interested in that, definitely check out this blog post.

Next Piece of Content “Benchmarking is hard, sometimes ...”. This is from vondra.me and he's showcasing a performance graph where as he scaled up the number of clients, suddenly performance dropped to the floor and stayed there. And then suddenly after 100 clients, it started ratcheting back up. So you could imagine this should be a graceful curve that eventually meets a peak. But he has no idea what's going on between 25 and 100 clients, and he's been trying to diagnose it and he has a whole list of things that is not causing this problem. So he lists locking, cpu, you know, it's not this, it's not that. Then he lists some potential candidates of what might be happening, but he's still kind of stuck and not quite sure what's going on. So if you want to help him out, you can definitely check this blog post to see if you have an idea of what might be going on.

And the last piece of content “Ultimate Guide to POSETTE: An Event for Postgres, 2025 edition”. This is from TacitusData.com and POSET is happening this week, so starts June 10th to the 12th. It's an online conference, I think it's free and they have all these talks available. So if you're interested, definitely check out the conference.

And now it's time for my Consulting Corner. So last week I talked about some techniques you could use to convert a normal large table to a partition table, hopefully with very little downtime. So one of the ways I mentioned was to basically convert an existing table into a default partition and use that as a part of partition tables. Now that can work with some very fast locks, but the problem is now you have all of this data in the default partition and you want to move it into the old partitions. Well, how can you do that? Well, the way to do that is to create one of the old partition tables. And then in a transaction, you lock the partition tables, you delete from the default partition and insert into this old partition table, and then you attach it to the partition table, and then you commit the transaction. So that moves data from the default partition into the new partition. The problem is, if you're using range partitioning and say you have a quarter of data, well, you have to move a whole quarter of data from, from the default partition into the new partition. But then I had an idea. Why don't we just move the data piecemeal? Because even though you're using quarterly partitions or monthly partitions, you can actually define whatever partition you want. So even though if all of your partitions are monthly, you could add a new partition that is one day or one hour. So what you can do is create that table, lock the partition table, move one day or one hour of data from the default partition into the new partition, then attach it to the partition table and commit the transaction. So now it will read that one day or that one hour in that table you created, as opposed to the default partition. And then if you need to add more data to that particular table, you lock the partition table, you detach that partition from the partition table, you, you delete the data from the default, add the data you deleted to this partition, and you attach it, adding another hour or another day of data, however much data you moved. Now all of that data is then the old partition and has been removed from the default partition. So essentially, you rinse and repeat, constantly building up that old partition table until it has all the data that it's needed. If you're doing monthly, it has a month worth of data. If you're normally doing quarterly, it has a quarter worth of data. Then you can go on to the next partition. Now, I haven't fully tried this out in a production setting, but this is something we may explore to try and do a minimal downtime backfilling of a conversion of a partition table. So hopefully you found that of interest.

Move Off Of RDS? | Scaling Postgres 369

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript