Leveraging Postgres, Tuning Autovacuum, Slow Queries

In this episode of Scaling Postgres, we review articles on how to better leverage postgres, tuning autovacuum, handling slow queries and using window functions.

Content Discussed

YouTube Video

Podcast Audio

Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about leveraging postgres, tuning, auto,vacuum, slow queries and window functions. I'm creston. Jameson and this is scaling postgresepisode 15.

[00:00:21] All right, our first piece of content is actually a YouTube video. It's called rails. Comp2018 using Databases to pull your Application's Weight by Harisankar P. S. Now, this is on theConfreak's YouTube channel that you can find it. And this presentation is done by someonewho's a Ruby on Rails developer. And he describes how he started leveraging the power of hisPostgreSQL database in order to optimize the performance and make his job as a developereasier. So if you're a developer, particularly a Ruby or a Ruby on Rails developer, this isdefinitely a presentation I suggest you check out. So here's a quick overview of some of thethings that he covers. So, he talks about indexing and basically what you should think aboutwhen indexing. So, for example, the primary key foreign keys columns that would be in a whereclause. Index keys used to join tables, date columns if you're using them frequently, and alsoadd a partial index to particular scopes so that they can be retrieved faster. So definitely somegood advice there. He also talks about his use of materialized views. So, views are essentiallyvirtual tables that give you a representation of data you want to retrieve, usually across morethan one table. However, a materialized view means that it actually creates a table for it that youcan apply indexes to. So it offers very, very high performance. The downside is thesematerialized views have to be rebuilt periodically. But based upon your use case, these can giveyou some outstanding performance benefits. And he covers how he uses them in his application.And the last area he covers is actually choosing to have his database on occasion generate theJSON that he delivers to end users, as opposed to pulling from the database and generating theJSON within Ruby. And he saw some very significant performance gains by actually using thedatabase to generate the JSON using functions such as rotajson. So these were some majorpoints of the presentation, but I definitely suggest you check out this video.

[00:02:45] The next post is Faster JSON generation using PostgreSQL JSON function. And thisis from the bigbuinary.com blog. Now, this is related to the previous video. I'm not sure if thisindividual saw that presentation, but basically he describes the same scenario as the previouspresenter did in the YouTube video. And he says, quote, as the number of records in thedatabase grows, rails can take a very long time to generate essentially a JSON response. Andquote, the bottleneck can be traced back to JSON generation. So basically they did what theprevious presenter did. They started using the row to JSON function and then in certain casesarray to JSON and they showed exactly how they used it in Rails. These particular functions, inorder to generate the JSON. And the performance benefits that they saw were for short records,it went from 40 milliseconds down to 15 milliseconds, but for a large number of records it wentfrom 1200 milliseconds down to 20 milliseconds. So essentially 60 times faster. So definitelysome pretty big performance gains realized by having PostgreSQL generate the JSON asopposed to doing it through Rails. But they do mention this because it is a bit more complex todo this. So they recommend, quote, use Rails way of generating JSON as long as you can, butonly when performance starts to be an issue. Think about switching over to generating JSON outof PostgreSQL, but definitely something to keep in mind to see if you need assistance in scalingyour database with this.

[00:04:28] The next post is Tuning postgres auto vacuum for scale. And this is theGojakeengineering.com blog. So this post discussed the scenario where they had a heavilywritten to table where auto vacuum was having trouble keeping up. Now in particular they talkabout the importance of the transaction ID and the potential for it to run out and the dangers ofthat where postgres could literally shut down if you don't have enough transaction IDs forparticular table. Now, they had a heavily written to table where they had millions of transactionIDs being consumed each day in their table and quote with the default settings, the auto vacuumprocess was able to reclaim only 70% of them and took up to eleven days --to complete. So auto vacuum definitely needs to be tuned in this case. But here they go overwhat they did. They described the database virtual machine they were using and then they wentin and described the different settings that they changed. So they changed auto vacuum workmemory up to 2GB and then they have a lot of the detail here about what the setting does aswell as changing the vacuum cost limit, adjusting the maintenance work memory and the workmemory as well. But related to auto vacuum, essentially these two auto vacuum work memoryand auto vacuum cost limit were the ones that sped up their auto vacuum. So now they wereable to process 1 billion IDs in only two days versus not being able to do the daily consumptionand taking eleven days to do it. And then they talk about the future, what they're planning interms of increasing the number of workers and increasing the cost limit. Because when youincrease the number of workers, you actually should increase the cost limit as well, otherwisethose workers will actually work a little bit slower. And basically how often to kick off vacuum isthe auto vacuum vacuum scale factor. And they have some follow up advice in terms ofmonitoring the transaction limit as well as some design considerations where you can reduce therisk of having these heavily written two tables such as partitioning or storing data in more thanone table. So if you're observing some of the issues they were or heavily written tables and autovacuum. Having trouble keeping up. Definitely a blog post to check out.

[00:06:53] The next post is why is PostgreSQL running slow? Tips and Tricks to get to thesource. And this is from the several nines.com blog and in it they go over a step by step processby when you have slow queries what you can check. So their step zero is just basically gatheringas much information about what are the problems, where it's happening in the application, whatare the queries that are being impacted. Step one is check. PG stat activity. What are the activequeries that are running in the database and look for ones that are waiting or being blockedpotentially. And if the queries are running okay, why are they taking so long? So basically useexplain plan at that point. Another thing to check, or are the tables loaded? Has it been a whilesince the auto vacuum has kicked off to remove the dead tuples in a table? So you can checkthat as well as checking how often checkpoints are occurring by looking for fatal or errormessages in the PostgreSQL log and then covering things like the health of the system is itswapping to disk, what is the CPU load and the different logs of the system. And then finally theysuggest getting some assistance if you're not able to narrow down what the potential cause is.Some pretty good series of steps to check when your database or certain queries are runningslow.

[00:08:16] The next post is fun with SQL window functions in postgres. So window functionshistorically I've typically seen them used to do running totals. For example, let's look atsomeone's salaries over the past twelve months and do a running total in a column. However,you can do a lot more functions and this is a very short blog post that talks about some of those.So for example, one of them is you could rank over what you're partitioning by. So in this casethey're sorting salary information and ranking it by the top salary by department. And it also goesinto in addition to rank you could also do percent rank which gives you the percentage rankingbased on your window. And then you have Lag which will give you the row value x rows beforeyour current row and also Intile when you want to compute what percentile particular values fallin. So again, this is a very brief post, but it shows you some of the capabilities that Windowsfunctions offer. And a lot of times for particular queries it's faster to use window functions in yourapplications versus pulling all the data back and doing that in your applications layer. Sodefinitely a blog post to check out. The next post is actually a YouTube video called scaling arelational Database for the cloud age. And this was at Data Enginef, San Francisco, 2018. Thiswas on the Haka Labs YouTube channel. And this is the co founder and VP of Engineering atCitus Data. And basically this YouTube video, he goes over three main areas of how to scaleyour PostgreSQL database. And of course, because he's from Slidus Data, he focu --ses on the Sharding aspect. But I found it very interesting how he discusses scaling arelational database and he discusses basically splitting tables into their own database, similar tothe thoughts on Scaling that I presented in a previous episode of Scaling Postgres, whereactually the PostgreSQL community is thinking about the future of Sharding. The secondarygoes over how to use SQL to be able to target particular partitions and bring results of queriestogether. Whereas the third talks about transactions that potentially have to hit multiple databasepartitions and how that requires using things such as two phase commits and in the case of likeglobal atomic transactions because when you're sharding, you're literally having more than onedatabase and all the coordination that needs to happen with that. So if you're considering Citusor looking at Sharding or what the future holds for, that definitely a presentation to check out.

[00:10:56] Last Post is PostgreSQL Streaming Replication a Deep Dive. So essentially, if you'renot familiar with streaming replication, it basically allows you to stream all the data coming into aPostgreSQL database to one or more replicas. And this post goes over it and gets it a little bitinto a little bit of the history of replication and basically how it has been using the write ahead logor the wall files. And it goes over how it works from a basic configuration with these convenientimages. So if you have not yet set up streaming replication and you're interested in reading a bitmore about it, definitely a blog post to check out.

[00:11:37] That does it. For this episode of Scaling Postgres, you can get links to all the contentpresented in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode, or you can sign up at YouTube or itunes.Thanks. --

Leveraging Postgres, Tuning Autovacuum, Slow Queries | Scaling Postgres 15

Join Over 1,000 Engineers & Get New Episodes Weekly!

Content Discussed

YouTube Video

Podcast Audio

Transcript