background

Any Scale, Monitoring Vacuum, Copy, Annotated Config | Scaling Postgres 7

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering Postgres at any scale, monitoring vacuum, using copy, annotated config files, and how to do easy replication failback to an old primary database.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about postgres at any scale, monitoringvacuum using copy in annotated config files. I'm Kristen Jamison, and this is Scaling postgresepisode seven.

[00:00:21] Alright, our first piece of content is actually YouTube video and it is Postgres. It's froma presentation called Postgres at Any Scale by Craig Kirstein's, I believe at Citus Data. Now thiswas actually done in 2017, but it was just recently, a relatively recently post in February. And inthis, this is intended to be more of a developer talk. So he goes over at the different scales fromwhere you're starting your app to where you are getting up to a medium level of traffic and thena higher level of traffic, what kind of things you need to pay attention to. So definitely if you're adeveloper, I would give this YouTube video a watch. The only comment I have with regard to it isthat when he discussed replication, he mentioned all these tools, but not the ones that Postgresprovides by default, which I found kind of interesting. And he really didn't go over too muchpartitioning and where that could potentially be useful. He went straight into Sharding. But ofcourse, Citizens Data is a Sharding Data platform, so that makes sense that he's more focusedon that. But it's definitely a good presentation for developers and some good guidance.

[00:01:41] The next piece of content is actually a YouTube channel, not a specific presentation,because there were quite a few presentations, but Pgcomf India from 2018, they recently postedall the videos on this YouTube channel. So definitely check it out and see if there's somethingthat you might find of interest that will help you scale your PostgreSQL database.

[00:02:10] The next article is Monitoring PostgreSQL vacuum processes, and this is from theDatadoghq.com blog by Emily Chang, and it goes over how to monitor vacuum and what to payattention to. So, for example, they talk about the different metrics that you should be monitoring,including dead rows, table disk usage, last time vacuum or auto vacuum ran, and differentvacuum events.

[00:02:40] And they also go over investigating different vacuum related issues. So is Autovacuum running or not? Are vacuums running into lock conflicts? Do you have long runningopen transactions that can cause issues with vacuum? So, definitely a pretty comprehensiveblog post about vacuum. So I definitely suggest you doing a readover of this to get more familiarwith vacuum. If you're not, or if you have particular issues, maybe there's something here thatcould assist.

[00:03:15] The next article is fast. CSV and JSON ingestion in PostgreSQL with copy byJonathan Katz. And this is on the Crunchy Enterprise PostgreSQL blog. So basically this is anarticle that discusses how to use copy, and if you're not familiar with copy, it's definitelysomething to be aware of. It basically allows you to insert a lot of records from a CSV file, orthey also have a JSON data file, and it does it much faster than doing individual insertstatements, so it's definitely a good thing to have in your toolkit. And Copy can be used toupload data into a PostgreSQL database or actually do export it to a CSV file as well, forexample.

[00:04:03] So they go over some examples here. So definitely something to check out.

[00:04:08] Now, with this, in terms of scaling, one thing you may be doing is inserting updating alot of data. And for that, I recommend taking a look at this documentation, populating a databaseand different performance tips that you can do, such as disabling auto commit using copy,removing indexes, foreign key constraints and then adding them back after you do the import.For example. So different ways to speed up if you're trying to import a lot of data into tables.

[00:04:40] The next article is advanced. SQL. Why? First Value and Last Value are no bugs, orprobably are not considered bugs. This is from the Cybertechql.com Blog and in it First Valueand Last Value are related to windowing functions and this is a very short blog post, but he goesover something you may not expect in their usage and explains why. So if you're not that familiarwith window functions, I definitely suggest checking it out and seeing what capabilities they can5:28] The next article is New annotated config files for PostgreSQL Ten. And this is from thedatabase Soup blog by Josh Perkus. And in it he has basically set up an annotated version ofthe PostgreSQL comp file, and he has also included a simple comp file with descriptions for theten most or excuse me, 20 most commonly changed settings and detailed advice on how to setthem, and then an extra 20 most likely to be changed settings. So if you've ever been confusedby some of the settings in the PostgreSQL comp for maybe what some of them did, even afterlooking some of the PostgreSQL documentation, this would be another source to check out andalso general recommendations as to what you potentially should be changing when you're doingyour PostgreSQL configuration.

[00:06:25] And this is the GitHub site that has the and I'll include this link in the show notes hasthe annotated CSV file as well as the simple configuration and the extra configuration. Sodefinitely a piece of content to check out. The next post is Upgrading PostgreSQL from 9.4 to10.3 with PG Logical. Now I presented a presentation that discussed using PG Logical toupgrade from one major version to another major version with minimal downtime, and this blogpost essentially does that, but it actually gives all the instructions and goes through how theyactually did the upgrade. So if you're considering going through this process, definitely check outthis blog post.

[00:07:14] Oh, and this is from Douglas Hunley at Hunleyd GitHub IO.

[00:07:22] The next article is announcing PG.

[00:07:25] And this is from the Rapidloop.com blog.

[00:07:30] And a few episodes ago I mentioned that Pgmetrics was released. This was acommand line tool that essentially sent SQL functions to postgres and collected the data andthen formatted them into a JSON format. Well, this is an online service called PG Dash thatactually processes those metrics that PG metrics produces. Now this is a service that runs in thecloud. It's free during beta, it says in this post. But if you're looking for additional ways to monitoryour PostgreSQL instances, you may want to check this out. The next post is Citize 7.3, broaderSQL coverage, tableau integration, top end extension, and more. This is from the Citusdata.comblog. And this is they're announcing their version 7.3. So in the interest of scaling postgres, if atsome point you reach the scale that you feel you need to start sharding your database, you maywant to check out Citus data to help you with that.

[00:08:33] And the last post is fun with PG backrest. And this is from the Pgdba.org blog. Now,typically I use the built in or the community provided utilities that PostgreSQL provides, but thereare a number of other backup technologies or replication technologies you can use. PG backrestis one. So if you're interested in potentially using this utility, it goes over here, how to set it up,and even has some videos with regard to that. The only disadvantage with this particular post,and maybe it will be something in the future, is that they talk about performing the backup, butthey don't do the restore part. Usually I always like to see that because that's the most importantpart of the backup is the restore.

[00:09:20] Generally when I do my tutorials, I also show how to do the restore.

[00:09:26] And the next article is PostgreSQL replication with Easy failback. And this is a indepth video tutorial I put together@scalingpostgres.com and it details how when you have afailover event, so you have a master and one or more replicas and you promote one of thosereplicas or also considered a failover event, you fail over to that replica. At some point you wantto fail back to the master. Now, typically you would have to restore the entire database to themaster and then make it become a replica of that new primary. However, if you have a largedatabase, that can take a really long time. So there is a way to reuse that old master or that oldprimary. Now, one way is you make sure that no writes have happened to that old primary afterthe promotion event. And I call that easy way to fail back to that old primary. And that's what thistutorial goes over. The other way is using PG rewind and there's going to be a tutorial coming upl the content presented in the Show Notes be to head over to Scalingpostgres.com where youcan sign up to receive weekly notifications of each episode. Or you can choose to subscribe viaYouTube or itunes. Thanks. --

episode_image