background

Researching Performance, Postgres Survey, pgAdmin Survey, Long Running Queries | Scaling Postgres 227

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss research into a performance puzzle, results from a Postgres and pgAdmin survey as well as the impacts of long running queries on Aurora and Postgres.

Content Discussed

YouTube Video

Podcast Audio

Transcript

All right. I hope you, your friends, family, and coworkers continue to do well. Our first piece of content is "Researching the Performance Puzzle". This is from ardentperf.com, and he's referencing a blog post that was posted, I believe, last week, or maybe the week before last, where he posed a performance puzzle where he had a table with two different big ints, same indexes, same data type, and same values contained within. But one was super fast to do a query, pulling up about half the values within a range, whereas the other query took forever. And the question was, what's the difference? And basically, the second big int column had a random distribution of the IDs in there. So basically, it demonstrated how sequential data can be queried much faster. Now, it already had an index on it, so why would that take a while?

Because in the indexes, it's sequential, but really it had to go to the heap to pull some data from another column in the table. So it was still doing across the heap, finding random I/O to pull back that data. But in this post, he goes into a lot more detail as to why we're seeing the behavior that we're seeing. He set up an EXPLAIN ANALYZE for analyzing the buffers as well. And one thing that was obvious, he says, quote "Our first query only needed to read 5,533 database blocks to find the 500,000 rows it was looking for. But— in the same table—, the second query needs to read 378,035 blocks to find the 500,000 rows it's looking for!". So that's a big difference between a Sequential Access and, essentially, a Random Access. But what's interesting about this is that in terms of the second query, basically, it has to get 500,000 rows.

And there were 123,000 shared hits to the shared buffers, whereas the shared reads were 378,000. So that's about 500,000, that makes sense. So what on earth does it only need to access 5,000 database blocks for? What it does is when the database pulls a given block as it's reading it, it kind of knows that the data is in there for additional rows and it doesn't flag another hit for it. Essentially, it's able to use the data that's already in that block. So it's an efficiency he talks about here. Now, then he said, 'Well, what if I make the number of blocks equal?'. So do a Random Access to the same number of blocks as a Sequential Access, and what does that look like? What he found once he calculated a read latency is that the first column, the sequential one, had a read latency of 0.14 milliseconds, so super fast, whereas the latency from the second column was 0.4 milliseconds.

So he actually considers that a reasonable number. And he believes that Postgres is lying with regard to this latency because how is that possible? And he did some more analysis and basically, what's happening is that Postgres relies upon the underlying operating system to do some work, even in terms of caching and whatnot and really what's happening is the Linux read ahead. So again, when you have that sequential set of data you need to pull, the Linux read ahead plays a role in pulling more information essentially than it needs and that's how it's able to retrieve the data so quickly.

Even where Postgres does things, where he says he's calling it a liar. Then he compares, okay, what happens if we turn off read ahead to see what that looks like? And the overall execution time is close to the execution time of the randomly ordered data. So the sequential data helps, but it looks like the Linux read ahead is providing a ton of benefit as well. So this was a super comprehensive performance investigation and if you're interested in this kind of stuff, definitely encourage you to check out this blog post.

Next piece of content- "State of PostgreSQL 2022". This is from timescale.com and this is pretty much the full results, I believe, from the survey. So it's been mentioned in some blog posts, but here's the full survey that asks questions such as what's the main reason you chose PostgreSQL over other options? And they're talking about open-source reliability, the extension network that's available like PostGIS, and a lot of other points in the survey. So if you're interested in this type of information, definitely check out this blog post.

Next piece of content- "pgAdmin User Survey 2022". This is from enterprisedb.com and they did their own survey for pgAdmin. They found some things like 57% of the user base runs on Linux, 18% on MacOS and 63% on Windows. So clearly I guess their organizations are using Linux and Windows at the same time, but they do a great breakdown of different packages, where they got the packages from, how many desktop deployments, and even how they're deploying on servers as well. So feel free to check it out if you're interested in it.

The next piece of content- "Manage long-running read queries on Amazon Aurora PostgreSQL Compatible Edition". This is from aws.amazon.com and whereas I don't normally discuss posts on Aurora, there aren't that many Postgres posts this week. But this was interesting because it talks about the architectural differences between Aurora and Postgres. Basically, Aurora has a shared file system between their primary and their read replicas. So that has some advantages in terms of bringing up a reader very quickly because you don't have to replicate all the data. Whereas with Postgres, the database files are essentially duplicated. It also makes replicas cheaper because you don't need to duplicate the database space in Aurora.

But they're also talking about some disadvantages of this when you want to run long queries, particularly if you have queries that exceed the max standby delay, because those replicas eventually have to keep up with what the primary is writing, and that can cause queries to be canceled, particularly the long-running ones. And they have some techniques to mitigate this. A lot of it is making things faster, so tune your queries manually and vacuum faster. If you have cancellations, implement retry logic as a part of your application. Redirect the query to the Aurora nodes. None of these are easy ways to resolve it, but again, Postgres has the same kind of issues with canceled queries as well.

But if you want to get more value out of this post, the next post is "5mins of Postgres E29: Postgres lock conflicts on Amazon Aurora & tuning max_standby_streaming_delay and hot_standby_feedback". This is from pganalyze.com and this is their five minutes of Postgres where they examined this post and talked about it and then also compared it to Postgres and how in some ways postgres has a little bit more flexibility in dealing with this because you can change hot stem by feedback in postgres, but not Aurora. It's on by default in Aurora. He talks about a use case of someone who wanted to run some long-running queries, and basically, they turned off hot_standby_feedback and then made a very long, I think, six-hour max_standby_streaming_delay so that long-running queries could be handled without being canceled. But I definitely encourage you to check out this episode this week as well.

The next piece of content- "How to Set Up NFS and Run PG on it". This is from highgo.ca, and this post does exactly what it says. It shows you how to set up an NFS server and actually put your Postgres database store and run it on there. So if you're interested in that, you can check out this blog post.

The next piece of content, the PostgreSQL person of the week is Beena Emerson. If you're interested in learning more about Beena and her contributions to Postgres, definitely check out this blog post.

The last piece of content, we did have another episode of The Rubber Duck Dev Show this past Wednesday evening. This one we had a discussion with Jason Charnes on the #NoMoreEstimates movement with regard to projecting when development work will be complete. So if you're interested in this type of long-form developer discussion, we welcome you to check out our show.

episode_image