Performance Issue, Survive Without Superuser, Reserved Connections, Partition Management

In this episode of Scaling Postgres, we discuss resolving a performance issue, how PG16 allows you to survive without a superuser, reserving connections and handling partition management.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Well, this is episode 250. And since I do approximately 50 episodes a year, that means I've been doing Scaling Postgres for about five years. So thank you for joining, listening, or watching these episodes. I greatly appreciate it and we'll see what the next few years bring. But our first piece of content is "Puzzling Postgres: a story of solving an unreproducible performance issue". This is from medium.com from William Duclot. This post goes into an in-depth story about how he identified a problem in production and how they were able to ultimately resolve it. In the summary above, the first surprise that he discovered was that some Postgres drivers were using PGX in Go, but I'm sure others may do this as well, to use prepared statements.

Sometimes without asking the user. So for example they were using a library as a part of their infrastructure and the people who wrote the library said 'No, we're not using prepared statements', but it was actually the Postgres interface driver that was using prepared statements seemingly without some of the developers knowing what was going on. The second surprise is that queries executed via prepared statements can have a different query plan than those executed directly. These are the differences between the custom and generic plans. He found this because he was trying to reproduce the performance problem that he was seeing, but every time he ran EXPLAIN ANALYZE, getting a custom plan, he never saw this issue. It was only in production where prepared statements were being used that he saw this issue.

And then the third surprise is that Postgres' query plan has a bug in its cost estimation. So it was choosing a very strange plan; the Postgres team is actually investigating and trying to determine how to resolve this issue. Now how he ultimately resolved this particular issue is forcing the Postgres configuration to only use custom plans and not use the generic plan. Now that's a little bit heavy-handed to do but in this case, it definitely solved his problem and didn't cause any other issues. But the reason that I placed this post first is because he goes into a lot of depth on the exploratory process to identify what's causing the problem and gives clues as to what it might be and dismisses certain areas. So it's really well presented of the path that you would need to go down if you discover an issue like this. So highly encourage you to check out this particular post.

Next piece of content- "Surviving Without A Superuser- Coming to v16". This is from Robert Hass at rhass.blogspot.com. And he's talking about an enhancement coming in 16 with regard to CREATEROLE. Now what this post is addressing is a lot of database hosting providers don't want users to have access to the operating system level. They just want to access the database but not the operating system. The issue with CREATEROLE in Postgres right now is that it can grant you access to OS-level activities such as pg_ececute_server_programs. It can also allow you to increase your access to other areas, I think you can grant CREATEDB role to another user to be able to create databases. But then on the other hand, there are some things that it can't do easily with regard to going in and accessing objects that it has granted roles for and assigning users to those roles.

So the changes that they've done in 16 are to minimize the operating system exposure that CREATEROLE gives, but also granting the ability for it to have additional privileges or easy privilege access to be able to access other database objects based upon the roles it created and who's been assigned to those roles. And his assessment at this point is that it works pretty well, but there are some things that you still need to do as a superuser and that's CREATE SUBSCRIPTION or CREATE EVENT TRIGGER. So this is, say, phase one of the implementation but they'll have to address these issues in additional patches moving forward. But definitely check out this blog post if you want to learn more about this.

Now I did see an interesting comment at the bottom here where this commenter is talking all about the complexity involved with the users and roles system. They also talk about how they would love for it to be easier to work with. And the writer says he agrees the stuff is complicated and I can definitely get on board with that as well. But there was another post released this week called "PostgreSQL Basics Roles and Privileges". This is from red-gate.com. So if you want an overview of the roles and privileges system in Postgres and all about authorization and authentication, I definitely encourage you to check out this post to get a really good grounding in these concepts.

Next piece of content- "PostgreSQL 16: reserved_connections". This is from dbi-services.com. And they're talking about reserve connections which are connections that are reserved for superusers. This assures that if you're using all the connections to the database, there are always some that are on standby to log in as a superuser to do other activities. Maybe you need to drop some of those connections. It just assures you're always able to connect to the database as a superuser. Well, they've actually added, in Postgres 16, a new role called pg_use_reserved_connections. This actually allows you to assign this role to additional users so that they can log in, even if there are no more regular connections available. And of course this dovetails with the previous post so that these particular users have more control over certain database areas. This now allows those users who have been assigned this role to be able to log in even though other connections are used up. So you can check out this blog post if you want to learn more about this enhancement.

Now related to that, this week's "Five minutes of Postgres", "Postgres 16: Surviving without a superuser & reserve_connections" from pganalyze.com. Lukas talks about both of these posts and goes into more in-depth than I've done here. So I definitely encourage you to check out his piece of content as well.

Next piece of content- "How TimescaleDB Solves Common PostgreSQL Problems in Database Operations With Data Retention Management". This is from timescale.com. Basically, they're talking about partition tables and how frequently people use partition tables to be able to cull data as it ages out. So basically, you can create a partition every month or create a partition every year and when you're done with that partition at the end of some period you can just delete the whole table without having to go in and delete individual rows. But the issue that we're discussing with that is that it requires a lock on the partition and on the parent to know that this partition is being deleted and detached from the parent. Now Postgres has the concurrent option. So that enables you to do it without having to require as many heavy-handed locks.

But they're saying you still can run into issues that can arise as you're not creating the new partitions you need for new data coming in. So it's never able to add partitions, nor is that able to delete the old partitions, which may leave you in a disk-to-full situation. Now they talked about addressing it with regard to Postgres in terms of custom scripts or using pg_partman. And they focus on how TimescaleDB solves it; they do it a little bit differently because their partitioning scheme uses things they call hypertables. But I've never had too much of a problem with adding the partitions and then dropping them. Maybe you need to reduce the workload or reduce the number of jobs that are hitting the database when these scripts run. But if you want to learn a little bit more about this, you can definitely check out this blog post.

Next piece of content- "WAL Compression in PostgreSQL and Recent Improvements in Version 15". This is from percona.com. And they're talking about the enhancements that were added in Postgres 15 to be able to do different types of compression for your WAL. Now generally, this only affects full-page image rights that hit the WAL. They don't do everything by hitting the WAL. But it can dramatically lower the amount of disc being used and this is an example here of looking at a number of WAL bites being generated when there's no compression. And you can see it's approximately 8x as much using no compression compared to using the built-in compression scheme. Now the default compression scheme is pglz, so it's a customized LZ version that Postgres uses internally. But they say that the issue with that is that it results in high CPU utilization.

So the reason they implemented these others is that they can give you compression at a much lower CPU usage rate. Now it's unfortunate they don't show the CPU utilization in this graph or in a separate graph, but what they say they found is that the one that had the lowest CPU utilization was lz4. But as you can see it's a hair larger than the normal pglz compression. But the CPU utilization was much better. Whereas if you want the smallest possible WALs, zstd is the one to use. But it used a bit more CPU than lz4 did So based on your databases' needs, if you want a little bit less CPU utilization when you're doing compression, choose lz4. But if you want the most compression, it looks like you should go with zstd. But if you want to learn more, definitely check out this blog post.

Next piece of content- "A UNIQUE CONSTRAINT WHERE NULL CONFLICTS WITH EVERYTHING". This is from cybertec-postgresql.com. And he's talking about a real head-scratcher where there's a lot of constraints they want to apply with just a two-integer table. And I'll let you as the reader check out this blog post. Basically with a particular constraint that they wanted to apply, they couldn't think of any permutation where they could use a unique constraint or a unique index on the table to get what the desired outcome was. And it was mainly because of this null. But they actually came up with an innovative way of getting around it by using an exclusion constraint. So setting up range types in this case and an exclusion constraint with this particular GIST index gave them exactly what they needed. So if you're interested in exploring the solution, definitely check out this blog post.

Next piece of content- "How to Solve Advent Code 2022 Using Postgres- Day 14". This is from crunchydata.com. You can check out this post if you want to learn what his solution is to this problem.

Next piece of content- "PGSQL Phriday #005" is coming up. This is from pgsqlphriday.com. And this one is talking about storing non-relational data in your database. Some of the questions that we'll be posed for this next blog challenge are "What non-relational data do you store in Postgres and how do you use it? Have you attempted non-relational uses of Postgres that did not work well? What are the biggest challenges with your data, whatever its structure? How do you define non-relational data?". And the person who is sponsoring this post is Ryan Lambert Rustproof Labs and you have his post on this upcoming blog challenge. So definitely encourage you to blog about this if it's of interest to you. And we'll have the results once people have published them on Scaling Postgres.

Next piece of content is actually a YouTube channel covering "PGConf NY 2022". So these were actually posted in December but you can use this link to find all the different presentations that occurred there.

The next piece of content is actually a YouTube video and it is "Walk-through of implementing a simple Postgres patch. From sources to CI". This is from the Bite Relay YouTube channel so if that's of interest to you, you can check out this piece of content.

Next blog post is "Migration of Synonyms from Oracle to PostgreSQL". This is from migops.com. And SYNONYMS in Oracle are essentially aliases and they discuss if you're migrating from Oracle to Postgres, how do you handle those. Generally, it's basically using the search path in Postgres to address this. So if you're interested in learning more about that, you can check out this blog post.

Another blog post related to that is "Online rebuild of Indexes- Oracle vs. PostgreSQL". This is also from migops.com and an online rebuild of indexes is basically Postgres' means of doing concurrent reindexing. So if you want to learn more about that, you can check out this blog post.

Next piece of content- "PostgreSQL command line colors!". This is from fluca1978.github.io. And pgsql actually reads pg_color and pg_colors variables to be able to give you colored output. So if you're interested in learning how to do that check out this blog post.

Next piece of content. There was another episode of Postgres FM this week. This one was on "Infra cost optimization". So this is not strictly Postgres related, but it's talking about managing your whole infrastructure and the cost involved with it and cloud costs vs. running your own servers. I found this discussion particularly interesting, but you can definitely check out the audio here or watch the YouTube version.

Next piece of content, the PostgreSQL person of the week is Jim Chanco Jr. If you're interested in learning more about Jim and his contributions to Postgres, definitely check out this blog post.

The last piece of content, we did have another episode of The Rubber Duck Dev Show this past Thursday afternoon. This one was on "Short Ruby News With Lucian Ghinda". So if you're wanting to learn more about his new newsletter and why you chose to start it, we definitely welcome you to check out our show.

Performance Issue, Survive Without Superuser, Reserved Connections, Partition Management | Scaling Postgres 250

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript