background

Reduce Replication Lag, Explain Parameterized Query, PostGIS Day, Multiple PgBouncers | Scaling Postgres 242

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss how PG15 helps reduce replication lag, how to get the parameters used in prepared statements, PostGIS Day and how to use multiple pgbouncers.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "5mins of Postgres E44: Reducing replication lag with maintenance_io concurrency in Postgres 15". This is from pganalyze.com. And if this topic is familiar, he's actually covering the post that I covered last week on Scaling Postgres, which I said was probably the most important post of the bunch. Talking about the changes that have been made to Postgres 15 to allow replicas to more easily keep up with the primary at the stage where it's trying to write the WAL to the database files on the replica. It gives it some I/O enhancements that were added to Postgres version 15. You can actually change the maintenance I/O concurrency to help improve the application performance of the replica. So I felt this post was so interesting and it would be good to reiterate it again and listen to Lukas's perspective on this post as well. So definitely check this piece of content out if you want to learn more about that.

The next piece of content- "EXPLAIN THAT PARAMETERIZED STATEMENT IN POSTGRESQL". This is from cybertec-postgresql.com, and they're talking about a situation when you're using parameterized queries, or essentially prepared statements is what you typically hear. It is termed as where the parameters you send to the SQL statement are done in two stages. One is a PREPARE statement where you define the statement you're going to use, and then you put a parameter in, generally for the where clause or the order by clause. And that helps avoid some SQL injection attacks. But it also helps with performance because Postgres can reuse plans for subsequent execution. So for example, you prepare it once and then you can actually execute a statement multiple times. Postgres does custom plans and generic plans. Basically, I believe it starts with a custom, but it may switch to a generic. 
But that's based upon heuristic measurements that it makes and also how many times a particular query is executed. But this post examines okay, you want to understand why a query was slow, but if you look in pg_stat_statements, all it's going to show you is the parameterized queries. It doesn't show you all the details. So how do you actually explain to find out what the query plan is? And this post talks about a way to get a generic plan for any parameterized query. Now, you can't just explain the statement by putting in the parameter because it's not going to understand it. But what you can do is prepare the statement using the parameter as a part of it.

Then you can set the plan cache mode to force a generic plan, and then you can actually send null as the parameter in and it will give you the generic query plan. You can also use unknown for the parameters and that'll also let you be able to see what the generic query plan is. Now, we had talked about a previous post that showed how to do this technique as well, but what they've done here is they actually made an extension that does it that they're calling a generic_plan. If you're interested in this, you can create an extension, generic_plan, and then you can run this function generic_plan and give it the statement you want to get the explain plan from. It does all of these different steps for you. So if you want to learn how to find a generic plan for parameterized queries when you're trying to understand why a particular query is so slow, definitely check out this blog post.

Next piece of content- "ROUTE THE INTERESTING THINGS (NOT JUST ROADS) WITH OPENSTREETMAP''. This is from rustprooflabs.com, and they're talking about a presentation that Ryan Lambert will be doing or has already done at PostGIS Day 2022, which took place a couple of days ago, November 17. So this is information with regard to his presentation in terms of downloads and links, but he also has links to Crunchydata's website where they're the ones that are actually hosting PostGIS Day in a way that you can register. Now, the event has already passed, but I'm assuming they're going to have recordings posted at some point later. But if you're interested in that, definitely check out this blog post.

Next piece of content- "Postgres at Scale: Running Multiple PgBouncers". This is from crunchydata.com. They're talking about a situation where you're running PgBouncer for your system and it's just not keeping up. Now, the main reason I've seen that is basically the CPU usage for the core that PgBouncer runs on is hitting 100%. So PgBouncer is not multithreaded, it's single-threaded, so it can only use one CPU core at a time. So if you put it on a dual-core system or a quad-core system, you're going to have a lot of resources that are not used. Now, one technique to get around that is to actually create separate PgBouncer services and run them on different ports.

That way you can send traffic to individual PgBouncer processes. Essentially that can then talk to the database. But there's also another technique where you can actually use a common port and then that gets sent to multiple PgBouncer backends. Now, they don't explore how to do that here because there is already an existing write-up from a few years ago about how to set up this process by Peter Eisentraut. So if you're realizing that your PgBouncer is not keeping up or that the CPU core it's running on is the bottleneck for your database system, you'll probably need to add additional PgBouncers. And reviewing this post as well as the write-up on how to do it using a single port, you'll probably find it very beneficial.

Next piece of content- "Postgres Query Boost: Using ANY instead of IN". This is from crunchydata.com and he's talking about the IN operator that typically operates with a list of values, whereas the ANY keyword typically looks for arrays. And this has some ramifications depending upon your interface to Postgres. So he has an example here where he's using Java where he has an array of numbers he's looking for and he actually cannot use IN, but it works just fine using ANY. And if you actually use IN in the syntax of your statement and you actually look at the explain plan, most of the times I've seen the actual explain plan use ANY as opposed to IN. So this is another option to use if you run into particular issues when you're trying to run certain statements, at least through a programming language. But check this blog post out if you want to learn more about that.
Next piece of content- "Emacs(client) as editor in psql". This is from fluca1978.github.io. He's talking about psql. You have the ability to open up an editor by doing \e, and that'll open your defined editor and it will load in the last executed SQL statement that you can edit it, save it, close it, and then you can run it again. Apparently, He likes to use emacs for this, but it takes a long time to load. But apparently, you can run emacs as a daemon. So what you actually start is the emacs client that talks to that daemon and it starts up much faster. So he walks through the process of doing that. He also had some issues with his ZSH as opposed to Bash, but just using psql and the \e is a great way to edit queries you're working on in psql. And if you want to learn more, definitely check out this blog post.

Next piece of content-"Improved aarch64 (arm64) support is available in the PostgreSQL RPM repository". This is from people.planetpostgresql.org, and basically, they've had a number of updates to the RPM packages available like Red Hat, CentOS, and Rocky Linux distributions. So if you're using those on arm, definitely seek out these new packages to use for Postgres.

Next piece of content- "PostgreSQL 16: part 2 or CommitFest 2022-09". This is from postgrespro.com and they're doing a review of all the different code changes worked on in the last Commit Fest for the upcoming Postgres 16 end of next year. So definitely check out this blog post if you want to learn more about those.

Next piece of content- "Multi-cloud Strategies with Crunchy Postgres for Kubernetes". This is from crunchydata.com and they're talking about their Postgres operator and how they've set it up so it can actually communicate and do replication between two different cloud providers or even on-premises. So basically, they work out a process going from Amazon EKS as the primary and streaming to a standby on Google Kubernetes engine cluster. So if you want to learn how to do that with our operator, definitely check out this blog post.

Next piece of content is "Dynamic spare process management in Pgpool- II". This is from pgsqlpgpool.blogspot.com. They're talking about an enhancement to Pgpool II that dynamically defines how many processes to keep around for incoming connections, and it did result in some cases with a five times speed-up improvement. So definitely check out this blog post if you're interested in learning more about that.

Next piece of content- "How to use an external database for pgAdmin user settings". This is from enterprisedb.com and pgAdmin out of the box uses an SQLite database to store user accounts, auto-discovered servers, preferences, etc. Well, you can actually choose Postgres to store these, and basically you just define a CONFIG_DATABASE_URI and it will point and start using Postgres. So check this post if you want to learn more about that.

Next piece of content. There was another episode of Postgres FM this week. This one was on "HOT updates" or basically Heap-Only Tuple updates and they talk about what they are, the benefits, and how you can optimize for them. So if you want to learn more about that, definitely listen to this episode. Or you can check it out on YouTube using the link here.

The next piece of content, the Postgres person of the week is Önder Kalaci. If you're interested in learning more about Önder and his contributions to Postgres, definitely check out this blog post.
And for our last piece of content, we did have another episode of The Rubber Duck Dev Show this past Wednesday evening. This one was on "Top Down or Bottom Up Testing", so we talked through different considerations when you're testing your application. Whether you tend to like to do it from the UI side down, or start with unit tests and then build your application up from there. So if you're interested in that type of content, we welcome you to check out our show.

episode_image