background

Stats Collector Gone, Handling Latency, PG In The Browser, Future High Availability | Scaling Postgres 230

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss how the stats collector disappears in PG15, steps to mitigate high latency connections, how to run Postgres in the browser and the future of high availability.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "PostgreSQL 15: Stats Collector Gone? What's New?" This is from percona.com and they're talking about when you look at the processes of running Postgres 14, you will see a stats collector. However, when you're running Postgres 15, you do not see the stats collector process. Why? Because it's gone. So, what the stats collector does is it doesn't collect the statistics on the data, which is what analyze does. To track a sampling of the data in your database for the planner to make decisions the stats collector collects all the different statistics in the database related to counts of things such as how many commits have happened, how many rollbacks have happened, how many particular statements have been run, how many times auto vacuums run, and how many times analyze has been run.

So basically, things are located in the pg_stat_views, which they refer to here. So those statistical views, that's what the stats collector collects. Now they say how it used to operate is that each of the backend processes would send communication over UDP to the stats collector process and then that would write information to files. But apparently, people had some problems with this approach. At times you could get stale statistics, at times the stats collector wasn't running or autovacuum may not be working or starting. I haven't heard about these types of problems necessarily. But another big problem they mentioned here is the I/O can cause tens of megabytes to be written per second. I think they mentioned it somewhere here. And so then the question is, what would they do if they just got rid of it?

Well now, as they say here, instead of using the files and file systems, statistics now use dynamic shared memory. There's a commit here by Andres that describes a little bit of the detail with regard to it. So basically, there's one less process running and there should be less file writing to it. I mean, I'm assuming at some point it's going to save the contents of the shared memory on a periodic basis. But with using the shared memory, they actually have a new parameter called stats fetch consistency. So you could have none, which is the most efficient, but you're not going to have any read consistency for any queries that would expect to get the same value back.

You can set a cache value to it so that repeatable reads work and that this will be the default. Or you could do a snapshot, which is slower but presumably gives you a higher resolution because, of course, nothing would be cached. And they say what happens during a restart? Well, as part of the shutdown procedure it does save it to disk, but it will be discarded if there's a crash, which is usual for the stats. They also introduced a few different weight events because we're using shared memory now that there is also a possibility that the stats could cause some weight events, so they're adding some new ones here. But this is a pretty interesting addition I haven't seen, and there are a ton of people that did work on this feature. So thank you so much for doing that. If you want to learn more, go ahead and check out this blog post.

Next piece of content- "5mins of Postgres E32: Benchmarking and improving query network latency using tc, COPY and pipeline mode". This is from pganalyze.com, and they're talking about a post written by Lawrence Jones that described them making a transition at his former company GoCardless from IBM SoftLayer hosting to Google Cloud hosting. In the process of moving all their servers over, at one point, the database was going to be in a separate cloud provider, a separate data center essentially, than their application servers. So this could introduce significant latency. And that latency may have varied when they were co-locating in the same location at about 0.5 milliseconds up to as long as 10 milliseconds, going to an entirely separate data center. So in preparation for this move, they wanted to address things ahead of time.

And what they literally did is introduce latency into their running production application to test how far they could push it. So they would increase latency using the tc command, which is part of IP tables. It's traffic control. So you can actually introduce a latency and they increased it by one millisecond to measure the impact and resolve any issues that came up prior to doing their move. Now, as the episode continues, it talks about how you can track timing, and you can use EXPLAIN ANALYZE, which gives you the server return time. But in a psql client, there's also the \timing switch, and that will give you the timing essentially from the client. So if you're on the server running psql, this should pretty much match what you see in EXPLAIN ANALYZE. But if you're running psql in another location and reaching across the Internet or across a network with a lot of latency, you're going to see this timing increase because it's the result of pulling down data and sending it to the client.

He also talked about one thing that could hinder performance if you have high latency is if you're sending single inserts or single statements at a time. One way to speed things up if you have a high-latency network is to use multiline inserts. So you're sending more than just one row of data per insert, you can put many rows in there. And of course, COPY is a much faster way to transfer data if you need to do it over a high-latency network. And then lastly, you also talked about the new feature that was available, I think in Postgres 14 is pipeline mode. This is essentially an async process of statements sent to the server. So you can send multiple statements, but then not have to wait for the return and the data gets returned later. So these are a couple of different methods you could use to mitigate high latency connections. And if you're interested in learning more, I encourage you to check out this episode.

Next piece of content- "Crazy Idea to Postgres in the Browser". This is from crunchydata.com and this is a follow-up post that we discussed last week about the Postgres playground. And this goes into detail about how they set that up. Basically, they had seen someone had taken SQLite and compiled it to WebAssembly to essentially run in the browser. They said 'Hey, can we do this with Postgres?'. And unfortunately, due to some networking issues, that wasn't really possible. But what they did do was create a whole virtual machine in the browser. So there's this particular emulator that they used called v86. So basically it compiles to run in WebAssembly, but it's its own virtual machine.

So basically the first thing they did was set up a VM using QEMU and they set up a very minimal Alpine Linux image, created a very small image, set up the VM, installed Postgres on this minimal image, and made sure they could connect to Postgres using it. They then shut down the virtual machine and then they took the image they created of Alpine Linux and they used it to set up an emulation in v86. And with it they essentially got Postgres running in the browser on WebAssembly within a virtual machine using v86. So if you want to learn more about the details of this, definitely check out this blog post.

Next piece of content- "PG Phriday: Defining High Availability in a Postgres World". This is from enterprisedb.com and this is a post that Shaun has talked about multiple times on a better path forward for high availability in Postgres. He talks about the current situation, and what a lot of people are using now- some Petrones, pg_auto_failover, but his ultimate desire is to have something that looks like this. So basically, right now the Postgres cluster is considered a cluster of databases on a single machine. Ideally, they'd like to take that concept of a cluster to mean this whole thing inclusive of tools that perhaps don't even exist yet but have a common listener that all applications can connect to.

This listener is aware of what databases exist and which is the primary to send all the connections to or send read-only connections this way and you would have some consensus layer to agree and know and coordinate all the different nodes of Postgres that exist, as well as handling the listener services too. So he says if Postgres can build out something like this, we would no longer need tools such as etcd, consul.io, or HAProxy or anything like that. Ideally, Postgres would kind of handle it all. Now, of course, this would take a lot of work, and there's nothing like this currently planned that he said. But it seems like this is the path he would be very interested in following. And if you want to learn more, you can check out this blog post.

Next piece of content- "Ansible Benchmark Framework for PostgreSQL". This is from enterprisedb.com, and he's talking about them setting up a benchmark solution for testing Postgres or their EDB Advanced Server, and how they want to run many tests in parallel and to make sure builds work successfully. And they set up a way to do this using Ansible, setting up their entire infrastructure and being able to run multiple tests in parallel. As a benefit, they released this as open source, which is right here on GitHub. You can download and start using this framework for doing any Postgres testing. Maybe you're developing an extension and you want to do various tests with regard to it. Well, you could perhaps use this framework to assist with that, but if you want to learn more, definitely check out this blog post.

Next piece of content- "What Does a PostgreSQL Commitfest Manager Do and Should You Become One?". This is from timescale.com, and last July, it looks like Jacob here became a Commitgest Manager. He talks about what the job entails, as well as asks the question, should you become one? So if you're interested in that, you can definitely check out this blog post.

Next piece of content- "Upgrading postgis_sfcgal to 3.1 or higher". This is from postgis.net and this post talks about the release of PostGIS 3.1, there's a support library. The sfcgal support library is no longer part of the PostGIS core, so basically you are going to have to use this new library if you want that functionality. So definitely something to keep in mind if you use this particular library in PostGIS.

Next piece of content- "MIGRATE SCHEDULED JOBS TO PG_TIMETABLE FROM PGAGENT". This is from cybertec-postgresql.com, and if you're using pgAgent and want to migrate to pg_timetable, this blog post describes how to do it.

Next piece of content. They had another episode of Postgres FM this week. This one was on "How to become a DBA". So if you're interested in this 30-minute episode, definitely check out this piece of content. And they do have a link to the YouTube video. It's actually this little symbol down here that takes you to their playlist. I don't think it brings you to the exact video, but at least you get the playlist and you can find this episode within that playlist.

Next piece of content. The PostgreSQL person of the week is Antonín Houska. If you're interested in learning more about Antonín and his contributions to Postgres, definitely check out this blog post.

The last piece of content, we did have another episode of The Rubber Duck Dev Show this past Wednesday evening. This one was on "Live Streaming Laravel with Aaron Francis", but most of it was about how to get started with live streaming code development. So if you're interested in that kind of long-form developer discussion, we welcome you to check out our show.

episode_image