background

Postgres Scheduling Opportunities | Scaling Postgres 289

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss new options for Postgres scheduling tools, proper ways to kill processes and explain usage.

Content Discussed

YouTube Video

Podcast Audio

Transcript

Do you know how there are all of these scheduling tools you can use with Postgres, like pg_cron or a pg_timetable? I think there are a few others. I've pretty much always ignored them because my perspective is that I have Cron. What else do I need? But the first article today makes me wonder. I might want to check them out a bit more. But I hope you, your friends, family, and coworkers continue to do well. Before we get started, I do want to thank everyone who met with me about my Postgres course and I will have more details coming in the next few weeks about that.

But the first piece of content is "Making PostgreSQL tick: New features in pg_cron. This is from citusdata.com, and this is a scheduling tool that runs as an extension in Postgres. Now as I mentioned, I've always just used Cron. Or you could also use systemd timers to run any scheduled jobs. And even if you have a hosted database, you could still designate a particular application server to be your Cron server or your scheduling server and have that just connect to Postgres and do certain scheduled tasks. So we've never really seen a need to use some of these scheduling tools, but I believe pg_cron is available in almost all of the hosted Postgres services. But what I found interesting about this is that in version 1.5, you can schedule a job every few seconds. So as opposed to using the standard Cron syntax, where the shortest resolution you can do is a minute, this actually enables you to run jobs every 1 to 59 seconds.

So it's more of an interval time period as opposed to running a job at an exact time. And he said that this was one of the most popular requests. But the reason it has not been done is 1) Cron can't do this currently. It only does a resolution a minute. 2) If something went wrong with pg_cron, it could cause a lot of problems if it goes wrong every few seconds. However, given the maturity of pg_cron, they decided to go ahead and implement this feature. Now, what I find interesting is I could imagine you firing off some different things as they mentioned here, like updating aggregations for real-time insights. Like maybe you're checking something and you trigger an update of a summary table, or an update of a particular counter value, detect particular anomalies, or poll external sources. I'm not so sure I would have my database do this, but I would see that as more of an application function or implement more sophisticated scheduling workflows.

So this is an extension, and this doesn't explain how to install it, but basically, you install it as an extension, and it should be available if you're using hosted Postgres and many platforms. But to actually run it, you run this function cron.schedule, give it a name, the interval you want it to run, say, 10 seconds, and then the function you want to run. It does log everything running so you want to make sure that that table doesn't get full if you're running something every second, for example. Now, he did show an example of how this could be used where he hand-built a job queue executor. So this is not in pg_cron, this is just something he built using standard functions and procedures in PostgreSQL. First, he created a job queue table, then he created a job errors table to track any job failures. He created a schedule function so you can put one command to be scheduled. This basically just inserts that command into the job queue. He created a procedure that runs the jobs, where it basically gets the job from a queue, runs the command that was sent as a part of the queue, does some error handling, and also supports doing multiple attempts.

It can track the number of attempts for a job, so it can try multiple times. Then to start one of these executors, you just run the procedure that this code is describing above and he calls them job runners one, two, three and four. Because you can run jobs in parallel with pg_cron and you're basically just calling that procedure four different times and you're, say, running it every 5 seconds. So you have your own job queue here. And with this setup, you can schedule basically one-time jobs to do certain activities. So here's an example where they populated a random table with some random values. So this is super interesting. Now, whenever I've ever had the need to run something on a per-second basis like that, I've normally set up an application process or an application daemon that's constantly running in a loop with however many seconds I want to delay to check for more work that's available.

Now, I'm probably not going to move away from that for the majority of the short-term job processing that I'm doing. Because there's a lot to be gained from doing this within the application context with the language I use, Ruby, for example, or it could be Elixir. But I could see this really interesting from the perspective of updating totals all within the database. So not having to have an application server involved with keeping certain aggregations up to date. But I thought this was really interesting and it's starting to make me think maybe a little bit differently on how I might be able to use this for different tasks. So let me know in the comments what you think of this new feature. Do you use any of these scheduling tools that are extensions in Postgres? And if you do, how do you use them?

The next piece of content is "ERROR: CURRENT TRANSACTION IS ABORTED IN POSTGRESQL". This is from cybertec-postgresql.com. They're talking about the message and "commands ignored until end of transaction block" which is exactly what the blog post title is. And this kind of gives you a clue about what's going on. So the first thing this blog post talks about is basically transactions, in that they are atomic. They either happen or they don't happen. They're all or none, essentially. And whenever you run a command, like at a psql prompt, that's going to run within its own transaction, unless of course, you start with BEGIN. Now you're starting a transaction and every subsequent statement is within that transaction until you either commit it or you do a rollback statement. But how can you get this error? So in this example, he starts a transaction BEGIN. He does just a simple SELECT, then he does a statement that causes an error. So he divides an integer by zero, which gives an error division by zero.

Now when he continues to run additional statements, he's going to get this "ERROR: current transaction is aborted, commands ignored until end of the transaction block". So basically until this transaction is concluded, every additional command is going to output this error message on the screen, in this case, and also in the PostgreSQL logs. So basically, you can no longer COMMIT this transaction. So you can see here where he tries to COMMIT  it actually does a rollback instead of a COMMIT. So basically that's what this error message is. And he says if you really need to save part of your work within a transaction, then you need to look into save points, which is a way to save your way part way and you're actually doing subtransactions. They have a post on that here that they link to and you can talk about it. But personally, I like the atomic nature of transactions and I've never really had to do save points like this, but that feature is available if you're interested.

Next piece of content also from cybertec-postgresql.com is "KILL -9 EXPLAINED FOR POSTGRESQL". Now before we get into this, I would almost never kill any PostgreSQL process using OS tools to do it. I would use the functions pg_cancel_backend to cancel and run and query and pg_terminate_backend to actually terminate that backend and run those functions, of course from the say psql prompt. However, if you need to use OS commands for some reason, here are some things to take into account. So you can run the kill command for a given process. He's showing here the main Postgres process and then all the different subprocesses of it, like the logger, the checkpoint or background writer, WAL writer, auto vacuum launcher, and then the logical replication launcher. Now in this example, he ran the kill command for the process ID for the auto vacuum launcher. And that's really not a problem, it just simply stops that process and starts a new one.

So you can see here this process ID is gone, the one ending at 80 is no longer present. But you have this new process ID here and it just restarted the auto vacuum launcher. There's no problem. Now, the reason this is no problem is because what's actually happening is you're sending a SIGTERM signal that the process can actually capture and close itself gracefully. And that allows everything to remain consistent, particularly when you're talking about shared memory. But if you use kill -9 and this is the danger of it, what it does is it just immediately kills the process and there's no opportunity to shut down gracefully. Well, now you have no idea of the state of the shared memory because maybe this process has made some changes or alterations. So basically what the system does is to do a restart essentially and do a crash recovery.

So in this example, he shows all the different process IDs and then he does a kill -9 of a Postgres backend process here where a user is logged in to a Postgres backend that is currently idle. As a result, it is able to keep the main process running and the logger. But every other process checkpoint or background writer, WAL writer, auto vacuum launcher, logical replication launcher, all of those processes, in addition, to any other users that were connected to a Postgres process, all get restarted. So they all get new process IDs, as you can see here. So that's why you want to be very careful, especially with the -9 kill, because essentially you're killing a process that can leave the shared memory in an inconsistent state. So the system has no choice but to do a restart and do crash recovery at that point. But if you want to learn more, definitely check out this blog post.

Next piece of content- "Understand Explain Plans in PostgreSQL". This is from stormatics.tech. This is a blog post that covers how to read explain plans. Now, I personally found this a little bit difficult to read some of the examples here. I don't know if some of the formatting was off, but he does talk about explain, analyze, and buffers. In this example, I found this explain plan really hard to read because, of course, this is a continuation of a line here. So it just makes it a little bit difficult. It goes through all of these different queries and different ways to look at the explain plan. But I also found it interesting here that he said where to look for potential bottlenecks and that could be high costs. It could be bad estimated row counts, meaning that the number of rows it expects is different from the number of rows returned.

High buffer operations where there's a large number of pages that were hit or read filters that remove a lot of rows. There are sequential scans on large tables, there are external disk merge operations or a lot of temp reads and writes. But I know personally when I look at explain plan, I basically prefer to run analyze of course and just identify what node is taking the longest and figure out, 'Okay, how can I make that node faster?' Do I need an index? Do I need to change how the query is being run, et cetera'. And these are some of the areas I'm going to be focusing on in my course as well. But if you're interested in that, you can definitely check out this blog post.

Next piece of content There was another episode of Postgres FM last week. This one was on "Under-indexing", so the previous episode was on "Over-indexing". This is on under-indexing, which is arguably the biggest problem because most people with performance problems have an insufficient amount of indexing or inappropriate indexes on the tables to get the performance that they're looking for. So that's what they're covering this week. You can listen to the episode here or watch the YouTube video down here.

Next piece of content is actually a YouTube video, and it's "How We're Building AI Search Engines using LLM Embeddings". This is from the ThinkNimble YouTube channel, and they're talking about building AI embeddings and doing searches on your own data. So not relying on ChatGPT or an API, basically getting your own data, creating your own embeddings and querying your own data for that purpose. In this example, he is using PostgreSQL as well as the pgvector extension along with Django. So I would say a lot of the content is Python-focused, so there wasn't too much Postgres specific in here. But if you're interested in AI, definitely encourage you to check out this video.

Next piece of content- "PostgreSQL IO Visibility: #wehack PostgreSQL Internals and pg_stat_io". This is from andyatkinson.com, and he was talking about a weekly hackathon, for lack of a better term, on PostgreSQL Internals that was done. There is a link to the thread announcement. I believe it's already concluded. So it was just a very short week of hacking on Postgres Internals, and what Andrew chose to work on was pg_stat_io. So he talks a little bit about how data is stored in Postgres in terms of 8-kilobyte pages and then goes into pg_stat_io, which is the new system view that enables you to get IO stats out of Postgres. And he discussed who created it. Melanie Plageman was the lead on it, but Lukas Fittl from PgAnalyze assisted as well. It talks a little bit about how Postgres writes data, how there are different backend types, so there are client backends, there are backend writers, and there's the checkpointer for flushing memory contents to disk. So if you want to learn more about the pg_stat_io extension, you can definitely check out his article on it here.

Next piece of content- "SQL identity columns". This is from peter.eisentraut.org, and this is a very short post, but it's talking about autoincrementing columns, which are identity columns or in Postgres, they're serial or big serial columns. But the syntax for these autoincrementing. columns were created in the SQL 2003 standard, so 20 years ago, and there are these different implementations and they have pretty much the same syntax for using them. This was implemented in Postgres in Postgres 10 which was about six years ago. But what I found interesting is that these database systems still haven't implemented this SQL standard, so I'm super surprised that things like MySQL, MariaDB, and Microsoft SQL Server haven't implemented it yet, so I find that pretty crazy. But if you want to learn more, definitely check out this blog post.

Next piece of content- "Unlock Your Arsenals: GDB Debugging Essentials with PostgreSQL". This is from techbuddies.io and this post uses the GNU debugger to learn more about PostgreSQL internals. So if you want to learn about how Postgres works by tracing the code, definitely check out this blog post.

Next piece of content- "Performance implications of medium sized values and TOAST in Postgres and how to mitigate them". This is from pganalyze.com. This is this week's "Five minutes of Postgres", but he's actually covering a post that's a couple of years old by Haki Benita, where he saw something odd in that small text with, say, a length of two, three, four, or five words, something like that has little impact querying other columns in the table. Large text, like a full blog post or a full document, still had minimal impact with querying other columns in the table.

But medium-sized text like a description or a summary, where it's under the threshold of what is placed in the TOAST table, actually resulted in worse performance when trying to query other columns. The reason is that all of that medium text information was placed within the main heap table itself, it wasn't TOASTed, essentially, so that made the table relatively large compared to a small amount of text or text that was all placed in the TOAST. They show a lot of examples of this in the blog post and Lukas's video, as well as some ways to mitigate it, like maybe using a TOAST Tuple target or even creating separate tables for the text that you're working with. But check out this blog post if you want to learn more.

episode_image