Connection Scalability, Benchmarking Performance, Community Impact, zheap Progress

In this episode of Scaling Postgres, we discuss connection scalability, benchmarking Postgres performance, the community impact of an acquisition and the progress of zheap.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right, I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Analyzing the Limits of Connection Scalability in Postgres". This is from citusdata.com and they're talking about the limit that you have with the amount of connections you can have with Postgres and how it does pretty well in the hundreds range. But once you start getting to the thousands of connections, you usually have to reach a pooler such as PgBouncer or Pgpool to be able to handle higher numbers of connections. But of course, with that there are downsides. So for example, you have to use transaction pooling. Then, of course, as a consequence of that, you can't really use prepared statements, can't really use temporary tables, and you can't set things within a session. So it precludes using a fair number of features.

So it looks like Citus Data, now that it is a part of Microsoft, is working towards a way to find out where the areas are that need improvement to be able to support a larger number of connections, like in maybe the 10,000 or 100,000 range number of connections. Now, some of the issues that they're considering is memory usage? Is it that the connections are using up too much memory? Or is it snapshot scalability? So being able to take snapshots to know what a particular connection is aware of with regard to MVCC or does the connection model need to change fundamentally to support higher numbers of connections He went through several scenarios looking at what it could potentially be in terms of memory usage. There is some sort of constant connection overhead. But when he looked at this more in-depth, it looked like a general connection used below two megabytes based on some analysis he had done. Now, he said it's really hard to measure, but it's easy to get this overstated.

There's actually a second post that I'll mention called "Measuring the Memory Overhead of a Postgres Connection" by the same author from the Postgres From Below blog on blog.anarazel.de. Here he goes through the process of how he's analyzing how much memory these connections take in terms of having an overhead. So the details are definitely in this post, but the conclusion is it's relatively small per connection. So that, he said, is pretty manageable. The next area is cache bloat. So when you have one connection touching many different objects, you're going to have that cache increase. He demonstrates some of the size increases here that are causing this problem. Now, what he says here is that still he doesn't consider the cache bloat a major issue at the moment.

And that quote "A common solution for the cache bloat issue is to drop 'old' connections from the application connection pooler after a certain age. Many connection pooler libraries/web frameworks support that". So basically addresses the cache bloat issue. And the third mention is the query memory usage. This regards looking at work mem. But when you want a lot of connections for say, OLTP workloads, you're probably not going to have a very high work memory. So he attributed that as not a significant cause. But then he looked at snapshot scalability and he looked at the fact that when you have just a large number of connections, even though they're not being utilized, it does have a measurable impact on performance. He shows this here where you have one active connection with different levels of idle connections.

It shows how the performance drops by more than half going with, say, one connection with one active connection down to 10,000 idle connections and the same thing with 48 active connections going to 10,000. Again, it's about nearly a twofold drop in performance in terms of transactions per second. So even though they're idle, they're still doing something. We needed a CPU profile list to find out what was responsible. It looked like it was the GetSnapshotData function that does the bulk of the work necessary to provide readers with transaction isolation. He says these snapshots are built very frequently, at least once per transaction, very commonly more often. So he looked at this as a significant limit to work on moving forward. The next thing he takes into account is the connection model, particularly context switches, and how this model is fundamentally built. He definitely said that there are areas here that are ripe for improvement, but it would be a huge amount of work. So basically moving forward, he's focusing on the snapshot scalability portion to get some early wins in.

Now, this next link shows him working on this as part of a patch that looks like it started all the way back in March. So it has been actively working on trying to get Postgres to support increasing numbers of connections without necessarily having to rely on something like PgBouncer or Pgpool. So overall this was a very interesting blog post and I definitely suggest you check it out as well.

The companion piece measures the memory overhead of a Postgres connection in this post to the Postgres message board if you're interested.

The next piece of content is "OLTP performance since PostgreSQL 8.3". This is from 2ndquadrant.com. Basically, what this is is it's assessing different versions and their performance on a single piece of hardware across many different versions since 8.3. So if you look at this example here, this particular example is for an NVMe drive with a read-only workload and a relatively small database of 1.6 GB. It shows each version here in each of these clusters of data points, each with an increasing number of connections, I believe. So it lets you see a 9.2 something significant happened in that from there things have been undulating on the different versions but no major regressions. For example, and then when you bump up the scale by 10 to a 16 GB database, this does still fit in memory, but not the shared memory. You can see there are performance improvements that happened at 9.2 and again at 9.5.

Similarly, when you go up to a 160 GB database, again this system has a consistent 64GB of RAM. So it doesn't fit in memory. You mostly just see the big contribution that happened at 9.5 for performance. But now notice this as we're looking at some of these, it looks like some of the performance of 13 has dropped relative to 12. So I have noticed this in a number of the graphs. So it definitely depends on the workload. Even though there have been some performance improvements for some workloads, it looks like it's not always been a win going from say twelve to 13. Next, this is the NVMe with a read-write workload. Again, this is pretty consistent in the recent versions of a small database, a medium database, and even a large database. But you'll see there is this slight regression that happened from twelve to 13. Now maybe there will be further optimizations at different point releases, but at least this is what the data is indicating for this pgbench test that he carried out.

And then he also covers it for the SATA drive as well to look at the different performances. So it's definitely interesting. He also has different posts that are looking for the future of performance, talking about improved scalability with many connections and this refers back to this post here that we were just talking about. Also, support for non-volatile WAL buffers. So this is particularly interesting using non-volatile memory with a server to support buffering the wall to give better performance. So that was a very interesting post. Talking about a patch that's being developed and worked on for performance improvements. So if you're interested in performance, particularly across versions, definitely an interesting blog post to check out.

Next piece of content is "PostgreSQL TPC-C Benchmarks: PostgreSQL 12 vs. PostgreSQL 13 Performance". This is from enterprisedb.com. So here they compared 12 and 13 on the same hardware both tuned and untuned and they used HammerDB to do the comparisons, so not pgbench and they ramped up the number of users and what they saw for untuned performance was pretty much identical. Little bit higher in 13, but dramatic difference when you actually tune it comparing untuned to tuned between both 12 and 13. You can see the difference that 13 gives, particularly at the sweet spot of around 250 users for this particular workload. So here they're showing that 13 is a little bit ahead of 12 in terms of performance. So again, it kind of depends on your workload it seems on the difference in performance you can potentially expect with Postgres 13 versus Postgres 12. So another interesting benchmarking post.

The next piece of content is "Community Impact of 2nd Quadrant Purchase". This is from momjian.us. So this goes back to something I mentioned before, which is that last week it was announced that EDB acquired 2ndQuadrant, and that has some benefits because now there's a larger organization to advocate for Postgres. But the disadvantage is there's now kind of less diversity in the community. This very short post speaks to just that. So he mentions that now EDB's representation in the core team is 60%. So more than half of the team is from EDB. They did say that the core team is working on a solution for this. So it's good that they're acknowledging it. But he also talks about the strength of diversity that's important for the Postgres community. But he even mentioned something interesting here, a risk that an even larger company wanting to hurt Postgres could acquire EDB, taking a direction that is neutral or negative for the Postgres community, which wouldn't be too great. But on the other side of this, it is interesting that the work that was being discussed during the first post is actually happening at Microsoft. So another large organization furthering Postgres. So I do hope that the diversity of companies supporting Postgres increases.

Now related to this is another post- "EDB to Community - Acquisition Statement". This is from enterprisedb.com. They're speaking about the exact same thing about making a commitment to the open-source nature of Postgres. And they state here, quote "Fortunately, internal PostgreSQL community rules prevent any one company from taking control of the project". So it's again something that they are aware of and are paying attention to, but it's just something to keep in mind. And here are a couple of blog posts speaking to this potential issue for the Postgres community. So check those out if you're interested.

The next piece of content is "ZHEAP: REINVENTED POSTGRESQL STORAGE". This is from cybertec-postgresql.com, and they're referring to the new storage system that was actually started by EDB a number of years ago. I haven't heard anything about it recently. As a matter of fact, I was looking up some information on it yesterday because I have a client that has a very heavy update workload and I noticed it's been kind of quiet for a while and then this post popped up and it kind of gives a status. And they say that, quote "To make zheap ready for production we are proud to announce that our partners at Heroic Labs have committed to fund the further development of zheap and release all code to the community. CYBERTEC has decided to double the amount of money and put up additional expertise and manpower to move zheap forward".

So this is great news in my opinion that this new storage system is continuing to be developed and will be released hopefully soon at some point. Now to remind you, zheap is a reimagining of the storage system as a way to potentially not have to use a vacuum. Because how Postgres works now is when you have an UPDATE or a DELETE, the row gets marked for future removal and it's hidden for new transactions, but it still remains there for others. And then later those old rows, once they're no longer needed for visibility purposes, are removed by vacuum. The core of what zheap does is actually keep this information in a separate area. So if an update happens exactly what is said here, it does an UPDATE in place, but the old information is stored in the UNDO log.

So if it needs to roll back that transaction, it consults that separate storage area to roll it back. Or if an older transaction needs visibility to what that row had, it consults the UNDO log. But as they say here, basically the key design goals are performing an update in place so you don't have to do vacuums, have smaller tables because they've kind of reengineered the fundamental storage system, reduce writes as much as possible, and reuse space more quickly. So they go over the basic design of zheap and how most of the transaction information in the current storage system is stored with each tuple. Here they're doing the storage of the transaction information at the page level in something they call slots, and it goes through and discusses this from a technical level.

I don't have time to address it here, but you can feel free to review this post to understand how it works. Then they talk about the UNDO and how it handles doing inserts differently UPDATES and DELETES. So basically the goal is not to have to vacuum up all of these dead tuples when an update or delete happens. Basically, you store things in an UNDO, and then when they're no longer needed, they get removed from that storage area. So this could be very beneficial for heavy update workloads as opposed to, say, heavy insert uploads. So if you're interested in learning more about zheap, definitely check out this blog post.

The next piece of content is "How to Fix PostgreSQL Performance Issues with PG Extras". This is from pawelurbanek.com. They're talking about a tool called PG Extras and it's available for a number of different languages from Ruby to Elixir to NodeJS to Python. It looks like it is just a utility that from these languages consults the system views to pull out information related to performance. So you do have to enable the pg_stat_statements extension to get the relevant information you need. But it does things like help you check your cache hit ratios. You can do a very simple PG Extras cache_hit to get that information, as opposed to doing a query against the system tables. It helps you identify unused indexes to remove. How to add missing indexes looking for too many sequential scans. How to identify deadlocks and what locks are occurring. Getting rid of unnecessary bloats gives you a bloat calculation. Removing unneeded objects to check the size of database tables. So if you're interested in a quick tool to get some of this information, you can check out this blog post.

The next piece of content is "Logical Decoding of Large In-Progress Transactions in PostgreSQL". So this is a post from enterprisedb.com and this is a new feature that they're working toward where a subscriber typically waits for a transaction to be committed before it applies all of that information to the destination database. But with this it allows you to start storing that data on the subscriber prior to the final COMMIT. But then once that COMMIT happens, it goes ahead and commits it on the subscriber. So could be advantageous for very long-running or large transactions that are being processed. It looks like this is scheduled to go into Postgres 14, so if you're interested in that, check out this blog post.

The next piece of content is "Webinar: Highway to Zero Downtime PostgreSQL Upgrades [Follow Up] ". So this is a webinar from 2ndquadrant.com and you can click the link to get access to it. Here they're talking about different ways to do an upgrade and how to get down to zero downtime. Now they talk about using pg_dumps and then pg_upgrade and then using logical replication to do upgrades and that's the best way to do zero downtime. But there are some certain things you need to take into account when doing that, such as sequences that don't get copied over. You could have DDL changes you need to be aware of when doing something like this, and that their tool, pglogical, helps alleviate some of these problems and makes things easier. But if you're interested in that post, you can check out this webinar.

Next piece of content is "Using CTEs to do a binary search of large tables with non-indexed correlated data". This is from endpoint.com. Now this was an interesting use case. They had a database that they really didn't have access to to change indexes, but they needed to do queries against it. It had an ID field that had an incrementing counter, a sequence associated with it, and a created_at date. So created_at was not indexed whereas the ID field was indexed.

They wanted to sort by the created_at but it gave poor performance when trying to pull out records by that. So they used this ID field to determine what rows they needed to pull. Now in order to identify what ID was associated with a particular date. They developed this with a recursive query to essentially do this binary search to identify the ID ranges and then apply those to do the query using an index against the data. So basically they recursively went through and checked different dates to find the IDs to then do their query just using the ID. So it's quite an interesting use case and if you're interested in finding out more, check out this blog post.

Next piece of content is "PostgreSQL Monitoring for Application Developers: The Vitals". This is from blog.crunchydata.com. He's talking about monitoring the vital statistics of CPU, memory, disk, and network utilization. Why these are the core ones that he likes using, and how they can help you monitor your Postgres. If you're interested in that, check out this blog post.

Next piece of content is "Heap file and page in details. This is from highgo.ca, and they're talking about the low-level structure of the heap and the page structure being used. So that references a lot of C code and gives an example of how things work in the storage system. So if you're interested in that detail, check out this blog post.

The next piece of content is "Configuring Pgpool-II watchdog: It's going to be a lot easier". This is from highgo.ca. So they're talking about configuring Pgpool II and their watchdog to determine if an instance is down or not. Apparently, the configuration used to look like this with different configuration files for each node, and they're going to a unified configuration file that applies to all nodes. So definitely easier. If you're interested in learning more, you can check out this blog post.

The next piece of content is "How to Setup PostgreSQL Monitoring in Kubernetes". This is from blog.crunchydata.com, and I believe this is using their PostgreSQL operator tool to be able to set this up. So if you're interested in that, you can check out this blog post.

Next piece of content is "Waiting for PostgreSQL 14 - Support for OUT parameters in procedures". This is from depesz.com. So apparently with Postgres 14, you can now start using OUT parameters to return something from a procedure that's run so you can check this out for more details.

The last piece of content, the PostgreSQL person of the week is Andreas Kretschmer. If you're interested in learning more about Andreas and his contributions to Postgres, definitely check out this blog post.

Connection Scalability, Benchmarking Performance, Community Impact, zheap Progress | Scaling Postgres 135

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript