background

Per-Operation Explain, PG14 Contributions, Foreign Parallel Execution, Incremental View Maintenance | Scaling Postgres 172

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss tracking per-operation explain analyze, Postgres 14 contribution stats, parallel execution on foreign servers and incremental view maintenance.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "CALCULATING PER-OPERATION TIMES IN EXPLAIN ANALYZE". This is from pgmustard.com, and they're talking about being able to calculate at an operation level how long each piece of time takes, because sometimes it looks like it's relatively easy, but sometimes it becomes a little bit more difficult. So they start off with this very simple example where you're just doing a sequential scan and doing a sort, and they're using EXPLAIN ANALYZE to get the actual times. Now, the actual time of this whole operation in terms of execution is 0.077 milliseconds. The time to do the sorting here is actually 0.054 milliseconds. But if you look at the sequential scan on the table, the actual time it took is 0 seconds. So you can actually find out how long it took the sort to do it by subtracting the individual operation within it. So 0.054 minus 0.018 gives you 0.036 milliseconds.

So that's how long the sorting took. Ignoring the sequential scan part, the next thing they talk about is loops. So you want to look at each operation to see if it's looping at all. Now, there were no loops. In the one up here, you could see it says loops one. But in this one, which already says a nested loop, you could see that this operation loops once, but this operation loops 9 times. So what that means is you need to take the actual time here and multiply it by 9 to give you an estimate of how long this entire operation took. So you can see 0.028 times 9 gives you 0.252 milliseconds. So to get the overall time of this nested loop, you take the total time, and subtract it from that newly calculated 0.252, as well as the 0.019 to give how long it took this nested loop part to run, which is 0.064.

Things get more complicated when you're talking about parallel execution. Now, they say here if you want to get the individual worker timings for each parallel piece of work, you have to use the VERBOSE parameter. So they used EXPLAIN ANALYZE VERBOSE. Then it shows you the actual time for each worker for parallel operations. So this is a parallel sequential scan, and it actually has three threads. So one of them is the coordinating worker, and the other two workers actually collect the data of interest. So it doesn't report the coordinating worker. But what you can do is you can take the total of this activity, and again, it's three loops. So three times the total amount here, which is about 7 seconds.

Then you subtract this worker and subtract this worker to give you a sense of the main thread, and then things get even more complicated when you're talking about CTEs. So they have a CTE example here, and they say that being able to subtract these timings actually breaks down a little bit because they replicate in the timing output where things are reported twice in both the Append Node and the Scan and Limit Node. Now, they also mentioned a couple of tools that can help you diagnose this, including explain.depesz, explain.dalibo, Flame Explain, and I think pgMustard does some of this as well. So if you're looking for a post to help you better interpret your EXPLAIN ANALYZE output, definitely check out this one.

The next piece of content- "Some Interesting statistics about PG-14 contributions". This is from highgo.ca, and they went through a bunch of sources, including the commit logs release notes to collect statistics on the Postgres 14 upcoming release. So the first thing they did here was produce a chart by the number of features for a particular category area. So you can see there are ten features that cover vacuuming, and there are four features relative to the optimizer, for example. And he actually comes out with his top seven features of 14 he's most looking forward to. The first one is the asynchronous execution of the PostgreSQL_FDW append node. So this is where you can push out work to foreign servers and have them do the work in parallel. The second feature he's looking for too is the improved connection scalability with GetSnapshotData. So this would hopefully minimize having to use PgBouncer in some use cases because you can ramp up the connections further.

The third is overhauling UPDATE and DELETE processing, and this is particularly relevant for partition tables. The next one is logical streaming for large in-process transactions. So if you have a long ongoing transaction, the logical streaming can start streaming those down before the transaction is complete. Adding support for multirange data types, which we've discussed in previous episodes of Scaling Postgres. Allowing B-tree index editions to remove expired index entries to prevent page splits. And lastly, PostgreSQL_FDW batching. So before when you were doing a bunch of inserts, you would do one insert into a foreign server at a time. Well, now you can specify the batch size of how many inserts you want to do at a time, which could be 100, 1,000, et cetera. Then he gives a few more charts that show contributions by the author, so you can see what contributions have been made by the author. He also did it by the company as well as by country. I thought these were some pretty interesting statistics and I encourage you to check it out as well.

The next piece of content- "Parallel execution of postgres_fwd scan's in PG-14 (Important step forward for horizontal scaling)". This is also from highgo.ca, and they're talking about being able to eventually scale out Postgres. They have a post here where they went through three posts talking about the work that needs to be done to do horizontal scale out of Postgres. A new feature coming in 14 is the ability to push scans to foreign servers, have them work in parallel, and then send the results back to the local server or the primary node to be able to return that data to the end user. So it's basically a parallel append across foreign servers.

So he talks a lot about the contributors to this, who developed it, and then he goes into some example code of setting it up. Now he sets this up on just a single server so it's not a multi-server setup, which is how you would actually do the scale-out. But even doing it locally, they were able to get a performance improvement from about 136 milliseconds to 81 milliseconds using these new features. So there's still a long way to go to have the community edition of PostgreSQL do horizontal scaling of databases. But this is the next new feature that's working towards that ultimate goal. If you're interested in learning more, check out this blog post.

The next piece of content- "Implementing Incremental View Maintenance for PostgreSQL (Part III)". This is from yugonagata-pgsql.blogspot.com. So this post is talking about a future feature that they're targeting, maybe Postgres 15 and it's basically being able to create a materialized view that you do not have to refresh. In other words, it's kept up to date and this post discussed the process of going through it. So the example that they gave here is that you have two separate tables and the materialized view you have is basically a natural join between these two tables with the following output. They said, let's say you update this column here from one to first. How do you go about updating that materialized view?

They take it in three steps. First, they extract the table change that has been made to it. Then they calculate what the view delta is, what needs to be changed in the view to be consistent with what the table now stores, and then apply those deltas to the view. Now predominantly how they're capturing table changes is that they have triggers on insert deletes and updates on any table that's a source for this type of materialized view that you want to keep incrementally updated. But they go through the process of discussing how they do the updates to keep the view in place. So there is a fair amount of overhead for doing this, of course, but it's definitely something they're working on and working towards getting something implemented by Postgres 15. So if you're interested in learning more, check out this blog post.

The next piece of content- "Updates of PostgreSQL Observability diagram". This is from lesovsky.medium.com and there's this chart that's been produced which has a link to pgstats.dev that has a diagram of all the different processes and some objects in Postgres and how you can observe what is going on with them. So for example, if you want to learn about, say, the client back ends, you could look at pg_stat_activity. You could look at pg_backend_memory_context. That's something new in Postgres 14. If you're interested in the write-ahead log, well, there's pg_stat_wal and pg_is_waldir(). So it color codes and gives you an example of how you can get insight or observability into Postgres. So if you're interested in learning more, you can check out this post in the link included.

The next piece of content- "Ora2PG now supports oracle_fdw to increase the data migration speed". This is from migops.com, and if you're looking to migrate from Oracle to Postgres, one utility you could use is Ora2PG. They've done some updates to it recently. It looks like version 22, and apparently, this has been worked on for 20 years. But with the new implementation and using an oracle_fdw, they got a lot of speed improvements. So this post walks through what configuration would you need to do to Postgres and how you would set this up to do the migration. So the post goes into a lot of depth with that. I'm not going to cover that, but I'm going to cover the end result here from some testing that they did. With this new version, using the oracle_fdw, they were able to more than double their speed of migration. So that's a huge performance improvement. So if you want to learn more, check out this blog post.

The next piece of content- "PostGIS 3.1.3 is Released". If you want to learn more, you can check out postgis.net.

Related to that, the next piece of content is "POSTGIS SETUP WITH UBUNTU 20.04.2". This is from cybertec-postgresql.com. They talk through the process of installing PostGIS on Ubuntu, particularly one of the recent 3.1 versions. So if you're interested in that, you can check out this blog post.

The next piece of content is "How to Deploy pgAdmin in Kubernetes". This is from wnterprisedb.com. So if you want to install pgAdmin on Kubernetes, definitely check out this blog post.

Next piece of content, the PostgreSQL person of the week is Stefan Keller. So if you're interested in learning more about Stefan and his contributions to Postgres, definitely check out this blog post.

The last piece of content. I'll be following up with today is that we published a new episode of The Rubber Duck Dev Show, talking about project management. So specifically software development, and technical project management. So again, this is more long-form content about an hour in length. So I welcome you to check that out.

episode_image