background

Postgres Releases, Performance Secrets, Don't Do This, Filter vs. Case | Scaling Postgres 252

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss new Postgres releases, performance secrets, things not to do and filter vs. case.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PostgreSQL 15.2, 14.7, 13.10, 12.14, and 11.19 Released!". This is from postgresql.org. The reason was released is that there is one security issue that involves a client memory disclosure when connecting with Kerberos to a modified server. So it looks like it's basically a man-in-the-middle attack involving Kerberos. So presumably, you will be unaffected if this doesn't impact you. In addition, there are 60 or so bugs that were addressed in these releases as well. Predominantly for version 15, but other versions may have received some of these fixes as well. But if you want to learn more, definitely check out this piece of content.

Next piece of content- "Exposing Postgres Performance Secrets". This is from crunchydata.com. They're basically talking about what steps you need to take to be prepared to analyze the performance of your database. The number one thing listed is to use pg_stat_statements to record queries. And this is also the number one tool I use if a client comes to me saying their database is slow and they're not really sure what queries are causing the slowness. Go straight to pg_stat_statements and you can determine what those are and you can then look to individually optimize those queries. Number two is log slow queries, about a one second interval, and that's definitely what I do as well for the databases that I set up. Next is to log the explained plans for slow queries. Now I don't immediately jump into doing this because it does have an extra load on the database. So I only do this if I see variation between running the queries individually versus what actually runs in production.

But definitely a useful tool to get some of that insight. Four is to kill long running queries. Basically, he's suggesting putting a statement timeout on the database and I usually don't like doing this because I always end up forgetting that there's some statement timeout and I'm running a long DBA-related statement using Postgres and it times out sometimes. So generally, I just like to set the statement timeouts at the role level or at the user level. So for example, the application user can't run something longer than 30 seconds. But if I need to run some sort of DBA-based task, it's not going to hit the statement time-out. Or maybe they have different statement timeouts; maybe the Postgres user has an hour or two of statement time or something similar. But definitely great things to do to address your performance.

Next piece of content- "FOSDEM 2023- Don't Do This". This is from vyruss.org. And he's talking about a presentation that was done at FOSDEM 2023 in Brussels titled Don't Do This. And this is from Jimmy Angelakos. I really love his presentations and I definitely like this one as well. So if you want a great presentation of things you should try to avoid, I definitely encourage you to check out his YouTube video of this presentation here. Now we'll say there are two things I still do that he says not to do. One of them is using timestamps without time zones and using serial data types as opposed to the generated column capabilities of newer versions of Postgres. The only reason I do that is because my application framework still uses those. But definitely a lot of great suggestions in this presentation.

Next piece of content- "The Performance Impact of SQL's FILTER Clause". This is from blog.jooq.org. And this is probably in reference to a blog post that happened last week that we had on Scaling Postgres where they were looking at comparing FILTER to CASE. But they didn't talk about performance whereas he actually analyzed the performance, particularly in Postgres, and noticed about an 8% variance where FILTER was, as he says, 8% faster than the case statement. And he said he was able to add an auxiliary predicate to it to get the timing more similar, but you shouldn't really have to do that. So this may be the case where the Postgres planner needs some improvement. But definitely check out this blog post if you want to learn more.

Next piece of content- "Adventures in embedding Postgres schemas in GPT". This is from canvasapp.com. And they have an application that allows end-users to do data analysis without having to know SQL. And they said 'gee, wouldn't it be great to use natural language that then the language model of GPT could then translate into SQL?'. So they go through the process of trying to get that working. Ultimately, they decided it was not quite ready for prime time. Sometimes it just gave vastly incorrect results and the price they would have to pay for all the contexts that would need to be sent with a prompt was too extreme. But it definitely was an interesting insight and I foresee this becoming a reality in not too many years. So definitely encourage you to check out this piece of content.

Next piece of content- "SQL Tricks for More Effective CRUD". This is from crunchydata.com. And by CRUD they mean doing a create, read, update, or delete, typically through some sort of application framework. Now he says SQL tricks, but these aren't tricks to me, but they may be for most programmers because so many of the ORMs operate on a single record at a time. At least in terms of modifications, clearly you can pull more records, but a lot of the modifications happen one at a time. But SQL can very easily insert multiple rows as a part of a single insert statement which is great for performance reasons. You have fewer commits that are happening and more rows you can do at one time.

You can return data from one query and use it in another all as part of one statement. He talks about reading and joining multiple tables. He also talks about how when you're doing an update, you can actually look across multiple tables and insert and change multiple rows at one time. And the same thing for deletes as well. So again, they're not really tricks, that's just how SQL works. But I think the application frameworks, when they are working with CRUD, they're only working with one object at a time or one record at a time. Whereas SQL is designed to work with many. So if you're a programmer and you didn't know you could do these sorts of things, definitely check out this blog post.

Next piece of content- "Quick Logical Replication Checklist". This is from gorthx.wordpress.com. And this is exactly what it says, it's a checklist of things to check when you set up logical replication to make sure it's working and there are no errors. So I'll just leave it at that.

Next piece of content- "When to use VACUUM FULL". This is from depesz.com. And basically, he's saying never. He says if you think you need to use it, it probably means you need to optimize your autovacuum to run more regularly. Or you can always manually kick off regular vacuums. But don't do VACUUM FULL because it locks the whole table. So you can't write to it, you can't even read from it, so selects stop working as well. So you definitely don't want that. An alternative is to use pg_repack to do it. So this is an extension that rewrites the table but keeps the existing one active. Of course, the disadvantage of this, even with VACUUM FULL, is you need enough disc space to do it. Now he does advocate here that you can use pg_repack to redo indexes as well. But I'd rather just rely on re-indexing concurrently that is built into the recent versions of Postgres now. But if you're using an older version, perhaps you would like to use that feature as well.

Now related to that, this is the episode that Lukas covered on "Five minutes of Postgres" and it's "Never run VACUUM FULL: How to run pg_repack on Amazon RDS and Aurora". This is from pganalyze.com. He basically covers this post as well as gives some additional insights on working with RDS and Aurora. So check this post out if you're interested in that.

Next piece of content- "Find Text in ANy Column of a PostgreSQL Table''. This is from endpointdev.com. And he's saying I know there are some texts that exist, in this case, he says kilroy, somewhere. How do I find it? And he says the first solution was doing a pg_dump of the whole database. And then grepping through the dump file to try and find that record. Another alternative is using a COPY or \copy to copy out that table to a file and then grepping through the file to find the record of interest. But he says you can also use row types. So you can use a query here using a row type of text to try and find this instance. This enables you to search through a table to find text in any one of the text columns as it were. Then he has this additional command using \gexec that basically formulates the same query for every table in the database. So you can search for text in every table of the database for this instance. So definitely a great reference to have if you ever need to do this and check out this blog post if you want to learn how.

Next piece of content- "How to get a row, and all of its dependencies". This is from the depesz.com. And you have a row in a table and you may have foreign key relations to other tables. Well, how do you get that row and then all the rows that are related to it? And he actually came up with a basic schema here, three tables and inserted some data and said okay let's try doing this. And basically, he did a whole set of functions to be able to do this. Basically what he did is presented the results in JSON format. So you can see here, you have the records and then the associated records nested in it. So if you're interested in doing that, definitely check out his blog post.

Next piece of content- "Now you can \d table not only in psql". This is running depesz.com as well. He created a function called psql-backslash that is a set of pl/PgSQL functions that replicates the \d command in PSQL. So this is something you could run if you're using an SQL interface to Postgres but it's not PSQL. Frankly, I'd probably just use PSQL to do this but if for whatever reason you have a different tool, you could use this to get the same type of output as \d.

Next piece of content- "Waiting for PostgreSQL 16- Allow underscores in integer and numeric constants". This is from depesz.com. And to me this is awesome. If you have a really long number, now you can put underscores in that won't essentially impact the number but it helps for readability. And to make sure what you're typing in is exactly what you want. So definitely great to see this.

Next piece of content- "RESERVE CONNECTIONS IN POSTGRESQL 16". This is from cybertec-postgresql.com. We discussed this enhancement last week in Scaling Postgres where you can reserve connections not just for regular super users, but also for other types of users can still connect to the database if you need to. The main thought of this is that as users are granted other rights that are administration-like, maybe doing backups or some sort of other operations like that, you could give them reserved connections as well.

Next piece of content- "Relational or Non-relational Postgres?". This is from softwareandbooze.com. And this is his submission for pgsql phriday. This one he talks about going from ETL which is Extract, Transform, Load to a lot of times people are doing transformations in Postgres. So it's more of an Extract, Load, Transform, and this is talking about relational and non-relational data given pgsql phriday's topic.

And then related to that we have the "PGSQL PHRIDAY #005 RECAP". It covers all the different posts and this was posted on blog.rustprooflabs.com.

Next piece of content- "Invoking (your own) Perl from PL/Perl". This is from fluca1978.github.io. And this is basically how we can use Perl within Postgres to execute external Perl programs. So he covers how to do that in this blog post. Check it out if you want to learn more.

Next piece of content- "What's new in Citus 11.2 for Postgres plus Patrone HA support for Citus". This is from citusdata.com. And as a good tell, 11.2 of Citus is released, they made a lot of improvements to distributed tables in that they can handle more SQL and DDL features than they could in previous versions. And they list the new capabilities down below. But also what is of interest is that now Patrone 3.0 supports working with Citus. So now you can have your scale-out database that also handles high availability through Patrone. So if you're interested in that, definitely check out this blog post.

Next piece of content- "Precompiled headers in PostgreSQL". This is from peter.eisentraut.org. And he's talking about with Postgres in version 16, they're moving to the Meson build system which has support for pre-compiled headers and what the ramifications of that are. And basically not so much for anything non-Windows but it definitely improves build times with windows. So if you're interested in learning more, check out this blog post.

Next piece of content- "Postgres GitOps with Argo and Kubernetes". So if you're looking to run Postgres on Kubernetes and do it with a GitOps workflow with Argo CD, definitely check out this blog post from crunchydata.com.

Next piece of content. There was another episode of Postgres FM this week. This one was on "Benchmarking". So if you want to listen to their episode or watch the YouTube channel, definitely check out this piece of content.

Next piece of content, the PostgreSQL person that week is Bertrand Douvel. If you're interested in learning more about Bertrand and his contributions to Postgres, definitely check out this blog post.

And the last piece of content, we did have another episode of The Rubber Duck Dev Show this Thursday afternoon. This one was on "Hobby Programming Nick Schwaderer". So if you want to join us for a discussion about programming in your off time, definitely welcome you to check out our show.

episode_image