Explain Analyze Buffers, Healthier Postgres, 2021 Review, High Availability Aspirations

In this episode of Scaling Postgres, we discuss including buffers with explain analyze, having a healthier Postgres DB, the events of 2021 and aspiring for greater high availability solutions.

Content Discussed

YouTube Video

Podcast Audio

Transcript

Today's episode represents a review of all the content since the last episode, so I hope you enjoy it. Our first piece of content is "EXPLAIN (ANALYZE) needs BUFFERS to improve the Postgres query optimization process". This is from postgres.ai, and he's talking about the importance of adding buffers when you're analyzing queries. So normally you would run EXPLAIN ANALYZE to actually run the query and give you query timings and the output to understand how that query was done. Well, he's strongly advocating for adding buffers as well to give you an indication of the IO that's involved in processing that query. Now, he does say this does come with an impact on the timing a little bit because it has to be taken into account and that measurement.

So it does have a very minimal impact, but he says the impact is still worth it. He advocates not just looking at the timing and how fast you can get the query, but also focusing on the buffers because it's also a good way to compare between different environments. So if you're working in a staging environment or a restore environment, even if you have the same data set, buffers seem to be a much more consistent measure between different environments compared to just using timing. So the first thing he's covering is basically, you can see the IO work that's being done, or how many buffers have to be touched while doing a particular query when you're adding buffers to it. And a lot of the speed differences you see anyway are actually a result of doing less IO work. So this is also a point to advocate for using buffers.

In many ways, using buffers allows you to see why something got faster when, say, his example started using an index-only scan or different changes like a cluster of a table, you can see the IO reduction in it by less access to the buffer. So it gives you more information to kind of understand why something got faster. Then the latter part of the process, he talks about the benefits of thin cloning, which is one of the products that they work on is to be able to make very quick copies of the database using Copy-on-write techniques such as through CFS to be able to make quick clones to do performance testing. But this was an interesting blog post that provided a lot of useful information and if you want to learn more, definitely check it out.

The next piece of content- "Five Tips For a Healthier Postgres Database in the New Year''. This is from blog.crunchydata.com, and the first tip they mention is to set a statement timeout. So basically set it so that statements cannot exceed a certain duration when being run. Now, I would advocate not doing this at the database level or at the Postgres cluster level, but instead doing it for a, say, user role. So you wouldn't necessarily want to do this for the Postgres user, limit the statement_timeout, or even for a full database. I wouldn't necessarily do that because what if you're going to create an index or create something that is going to have a delay longer than 60 seconds?

I would actually advocate doing it at the role level so that your application users presumably belong to that role and the application users would be constrained by the statement timeout. Also, it's important to, as he mentions here, set an idle_in_transaction timeout as well. That would also be important. The second tip is to ensure you have query tracking. So basically this is using pg_stat_statements. So highly encourage you to set this if it's not already done for your database so that you can track all the statements running through the system. Because this is the main tool or feature I use to find slow queries when I'm doing a performance improvement for a client. The third tip is to log slow-running queries and this could be done by editing your Postgres configuration and setting the log_min_duration_statement low. In even more recent versions, Postgres allows you to do a sampling of it as opposed to logging every single statement.

The next tip is to improve your connection management. So now, he does mention that 14 had some great improvements in this. But if you're running a version less than, although there are probably some benefits that are still using it in 14, you could use poolers like PgBouncer to be able to use many more connections to the database without the database having to have the same number of connections. So basically you could have say 1,000 connections on the application side to the database, but the database is only seeing say 100 connections. The next tip is to find your goldilocks range for indexes. So basically what I think this means is basically finding the right balance of indexes. It's like not too few indexes, not too many indexes because then you're going to impact your transaction throughput but having the right balance for your system. So definitely a great set of tips. So if you're interested in learning more about this, you can check out this blog post.

The next piece of content- "A short summary of the Year 2021 in the PostgreSQL ecosystem". This is from migops.com and it's a review of all sorts of things that happened this year, including PostgreSQL getting the DBMS of the year for 2021. Postgres 14 and its new features, the minor versions released, and some extensions. So if you're interested in a review post, you can check this one out.

Another one also related in terms of a review post is "Postgres in 2021: An Observer's Year In Review". This comes from pganalyze.com. This is someone who is actually participating in the Postgres project and just mentioned a few things that he's observed. One thing is a 30x improvement when doing certain IN clauses that just happen to come with Postgres 14 and mention some other performance improvements here. Mentioned that there are some patches that are looking at extending the 32-bit transaction ID to 64-bit which would potentially avoid the problems of transaction ID wraparound with highly active systems.

This could really improve the situation. But of course, this would be a big change and some of the proposals here kind of get us a little bit of the way there. But some people are saying well, we really need to take this into account seriously. So it's not anything that he believes will be in and around Postgres 15, it may be Postgres 16 or later to get this type of a change in. Talked a little bit about some changes with regard to EXPLAIN as well as some general other improvements in Postgres for the year. So if you're interested in another review post you can check this out.

Next piece of content- "PG Phriday: Defining High Availability in a Postgres World". This is from enterprisedb.com and this is just an aspirational discussion of high ability and what it should look like for Postgres. This is a great post to read through because it talks about the perspective where Postgres has been and essentially still is now in that it was built as a single node database system, a relational database system. These other nodes have come up through replication, relying on the write-ahead log to create these additional nodes. There have been other features that have been added like making these standby nodes synchronous or allowing streaming feedback from a replica to the primary, but they're still, for the most part, independent nodes.

He's kind of taking a first principle, looking at, well, what changes would be a good idea to make to make it an actual high availability solution? Because, as he mentions here, Postgres isn't a cluster. It's not like some of the cluster databases. We're going to look at a YouTube video that talks about really distributed databases that are PostgreSQL compatible. So basically, this continues to talk about going through different iterations of what a high availability system looks like and looking and thinking this would be a potentially good way to go. You could have, essentially, a listener that all the applications connect to, and it's responsible for identifying who the primary is, who the replica or the standby is, et cetera. It could also have some sort of consensus system to manage the instances. But a very interesting post. If you want to learn more, you can check it out.

The next piece of content- "POSTGRES DATA DICTIONARY FOR EVERYONE". This is from rustprooflabs.com, he's talking about the new extension PgDD. That is a data dictionary that many different people who use the database could use. I wasn't necessarily interested in this because you can easily, with the slash commands, use psql to get insight into the data dictionary or you could also go directly to the system catalogs. But this has an interesting use case of being much easier to use. So it just has a columns view, a functions view, schema views table view, and a views view. And I can see this appealing because it's potentially easier to use, very consistent and you can just assign permissions to whomever you want. Maybe it's a data scientist or an analyst like they mentioned up here, or someone in some business unit to be able to get access to the data, and understand how it's being stored. So it's definitely an interesting use case and if you want to learn more, you can check out this blog post.

The next piece of content is actually a YouTube video and it was introduced to me by a comment from one of the previous episodes of Scaling Postgres where I was talking about how Aurora is not Postgres and he was saying well, actually there are some pretty big similarities between Aurora and Postgres when you're not talking about the file system. So a lot of it is similar, but where it differs a lot is how it stores its data. Aurora does a more distributed storage system for its database, but this is a post and it's from the Yugabyte YouTube channel and it talks about analyzing the PostgreSQL compatibility between Amazon, Aurora, Spanner, YugabyteDB, and CockroachDB. So if you're interested in distributed databases and how these differ from PostgreSQL, definitely encourage you to check out this YouTube video.

The next piece of content- "Fixing a PostgreSQL cluster that has no superuser". This is from endpointdev.com and basically how you do this is you run Postgres in single-user mode. So if you happen to run into this problem that you can't connect because the admin user, the main user of the Postgres cluster, has no permissions, you can actually start it in single-user mode, assign those permissions again. Here he did an ALTER ROLE Postgres SUPERUSER and it gets back the proper permissions and then you can start up the server as normal. So if you run into that issue, you can keep a hold of this blog post.

The next piece of content- "Ecommerce customer names with interesting Unicode characters". This is also from endpointdev.com. He's talking about where a particular ecommerce customer started seeing all these emojis as early as 2015 in usernames or names that people had made. These names are fake names but basically, these are the types of emojis that were just placed into the names. Then there are also different emojis for roles or everything. So a lot of these emojis are winding up in our databases and he includes at the bottom here a way to look through Unicode ranges to actually identify some of these. So if you're interested in that, you can check out this blog post.

The next piece of content- "Kubernetes + Postgres Cluster From Scratch on Rocky 8". This is from blog.crunchydata.com and they're talking about there's a new version of Kubernetes to release 1.22 and it has better support for cgroup-v2 as well as Linux swap. Well, they took this version of Kubernetes and got a Postgres cluster running on it using their Crunchy Data Operator v5 and doing it on Rocky 8, which is a Red Hat Enterprise Linux compatible distribution. So if you want to learn more about how to set that up, definitely check out this blog post.

The next piece of content is actually a two-post series that talks about data layouts and ways to do it and keep data up to date using triggers and stored procedures. So if you're interested in that, you can check out this blog post. This is from dev.to in the Yugabyte section.

Also from the same area, dev.to in the Yugabyte section, is another series that actually talks about creating a rate-limiting API and avoiding race conditions by using a serializable isolation level. So the first post here is "SQL to avoid data corruption in race conditions with SERIALIZABLE". And this is basically using the transaction isolation settings to be able to run as fast as you can without running into race conditions. If there runs into a problem, it essentially cancels that statement and it has to redo the work. This is a technique to create software that you don't have to deal as much with locks, so it could potentially allow you to go a little bit faster as long as you're willing to deal with some failures. So if you're interested in that, you can check out this whole series of blog posts.

Next piece of content. The PostgreSQL person of the week is Vibhor Kumar. So if you're interested in learning about Vibhor and his contributions to Postgres, definitely check out this blog post.

The last two pieces of content are episodes of The Rubber Duck Dev Show that we presented over the break. The first one is when should you mock or stub during your tests? The second episode is actually a talk about Ruby and Rails with Andrew Mason, who helps run the Remote Ruby podcast as well as the Ruby Radar Newsletter. So if you're interested in this long-form, developer-based content, definitely check out our shows.

Explain Analyze Buffers, Healthier Postgres, 2021 Review, High Availability Aspirations | Scaling Postgres 197

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript