background

Practical Indexing, Vertical Scaling, Partition Migration, FDW Performance | Scaling Postgres 56

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering practical indexing, vertical scalability, migrating to partitions and foreign data wrapper performance.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about practical indexing, verticalscalability, migrating to partitions and foreign data wrapper performance. I'm Kristen Jamesonand this is Scaling Postgres, episode 56.

[00:00:22] Alright, I hope everyone is having a great week. We do have a fair amount of contentthis week, so that's better than last week. So our first piece of content is a presentation that wasdone in PG Nordic Day recently and it's called look it up practical PostgreSQL Indexing byChristophe Pettis of PostgreSQL Experts. So this basically goes through all sorts of the differentindexes of PostgreSQL, why it's important to use them, and it goes over the Btree hash gist genSP gist in bren indexes. And what I found particularly interesting is around slide 53 he hasactually a decision tree. So he says what index to use based upon what your use case is. Sohow many rows type of column is it a small scalar and basically he goes through a decision treeprocess to determine what is the best type of index to use in your use case. So I found this avery valuable piece of content, particularly if you're a developer, but also of course DBAs to findout what is kind of the best index to use in PostgreSQL. So, definitely a piece of content Isuggest checking out.

[00:01:39] The next post is vertically Scaling PostgreSQL. So this is how you typically think ofscaling. Horizontal scale out is using multiple servers, but vertical scaling is basically gettingbigger and bigger servers. So this is kind of the default way to scale postgres and it actuallytalks about it from the perspective of CPU memory and disk and what configuration options youare going to change in PostgreSQL. So with regard to CPU, maybe you're wanting to increaseconnections and set connection limits at the database or for particular user roles. It talks aboutthe number of back end processes you can adjust based upon the number of CPUs as well asparallel queries, as well as allowing workers for doing better handling, potentially logicalreplication or maintenance workers. And then of course going into auto vacuum as well. Then ittalks about different memory parameters. So the main ones that you generally change as you'reconfiguring PostgreSQL is shared buffers and effective cache size and of course work mem. Butthey also mention temp buffers and some other things such as maintenance work memory orauto vacuum work memory and then in terms of disk talking about temp file limit, effective I O,concurrency, different page costs, so they go through different parameters that you would wantto change basically as you're scaling up your PostgreSQL instance. So definitely a good post tocheck out. With regard to PostgreSQL configuration, the next post is Migrating simple table topartitioned how and this is a relatively brief post, but it gives you a convenient way to be able totake a particular table and convert it to using partitions and the technique he uses. First hecreates a general user's table and places some rows in it. And here's the technique. Basicallyyou start a transaction, you alter the table name to something different. Then you create a newtable that is partitioned. So this will be the parent. And then you attach that table you justrenamed as essentially the default partition for this new partition table.

[00:03:51] And then you commit it so then all that data will be accessible. And essentially hesays your quote, now we have a user's table that's partitioned into a single partition. And thenyou can use this additional technique to actually move the data into different partitions. So youcreate a table that's a partition of it and then you use the CTE and you insert into the newpartition table and deleting it from the old or the default partition table now. And then you canattach it as a new partition. Now, he does have a caveat. It says unfortunately it will lock therows for the duration of the move, but this can be easily alleviated by doing the migrations insmall steps. And then he has another example here. And then of course, in the comments theyhave some additional suggestions that you should potentially evaluate to see if these differentpatterns may work better for you. But I thought it was a pretty interesting blog post. And if you'rewanting to move to partition tables, this is definitely something to check out.

[00:04:52] The next post is foreign data wrapper for PostgreSQL performance tuning. --And this is from CyberTech postgresql.com.

[00:05:01] So foreign data wrappers essentially allow PostgreSQL to communicate with otherdatabase systems or even files, basically access data outside of PostgreSQL. Now they'retalking about a particular feature that's in the postgres foreign data wrapper. So essentiallyyou're in a PostgreSQL database or cluster and you want to talk to another one. It's using thepostgres foreign data wrapper. So he goes through the process of creating a table, insertingdata, creating the foreign server, mapping a user to it, importing the schema, basically getting upto a point where he has a foreign table that he can access. And then he tests the postgresforeign data wrapper performance. So querying the table locally, he retrieves all the rows inabout 7.5 milliseconds. But when he queries it through the foreign data wrapper, it returns in 90milliseconds. So the question is why? And he attributes it to by default, only 50 rows are fetchedat a time and sent over the network. So if you're trying to send 100,000 rows, that's going to be alot of network requests back and forth. And he says what you can do is actually change the fetchsize to increase the size of it. So in this scenario, he increased the fetch size to 50,000. And nowwhen he runs that query, it runs in 40 milliseconds. So essentially he more than doubled hisperformance by altering this one parameter of the foreign data wrapper configuration. So ifyou're using foreign data wrappers and potentially can use a performance tip like this, definitelysomething to check out.

[00:06:38] The next post is PostgreSQL roles are cluster wide and this is from Blogs ed ACUk.com Not. And basically it's describing how roles are cluster wide or essentially users andgroups are cluster wide not specific to a database and how their organization handled it isactually they prefix the role that they create with the name of the database and what he'sproposing is potentially a way to support database specific users. The one I found mostinteresting is approach Three where when you create a role you can establish a DB only optionthat restricts that particular role to that particular database. But if you have a need to restrict bydatabase within a cluster certain roles, this is definitely a blog post to check out.

[00:07:31] The next post is having group by clauses. Ellen's General Bits this is fromSecondquader.com and basically this explains how to use group by with having clauses. Andhaving clause is kind of like a where but it operates at the aggregate level. So after your groupby it's like an additional where clause and they go through a few scenarios of using group byhaving. So if you want a little bit more practice with that, definitely a blog post to check out.

[00:08:00] The next post is actually older, but it's one of the ones that happened at PostgresOpen 2018 and it's another YouTube video I suggest checking out and it's called how to monitoryour Database or how to Monitor PostgreSQL. This is from Baron Schwartz and again it was atPostgres Open 2018 and he talks about kind of a general perspective to monitoring and what helikes to focus on. So I found it a really interesting perspective and if you have the time, definitelycheck out this presentation as I highly recommend it.

[00:08:34] The next post is Configuring PostgreSQL for Business Continuity and this is from theSeveral nines.com blog and it basically covers few steps to take to try to assure high availabilityfor your database in terms of enabling wall archiving and setting up automatic failover. In thisexample they're using Rep Manager and we've discussed some of this in some previousepisodes of Scaling Postgres, but this is definitely another post to check out. If you're interestedin enabling high availability for PostgreSQL, the next post is PostgreSQL Zero to Hero GettingStarted with RPMs part One. And this basically covers using an Rpm package to install andconfigure PostgreSQL on a CentOS or Red Hat based system. So if you're needing to do that,here's a blog post to check out.

[00:09:26] The next post is Azure Data Studio, an open source GUI editor for Postgres. Now, Iknow a lot of people just tend to use psql, but if you're looking for a Gui tool, here is another onethat is basically in preview support. So I guess you would classify that kind of as like a beta. Butif you're looking for a Gui tool, maybe this is something to check out, poten --tially to use in your installation.

[00:09:50] And the last blog post is shared relation cache. And this is from plain withPostgreSQL and a PG pool. So it's talking about a Shared Relation cache within PG poolworking with PostgreSQL. So if you use PG Pool and want to learn more about this feature,definitely Blog Post to check out.

[00:10:12] That does it. For this episode of Scaling Postgres, you can get links to all the contentmentioned in the show Notes. Be sure you head over to Scalingpostgres.com, where you cansign up to receive weekly notifications of each episode, or you could subscribe via YouTube oritunes. Thanks. --

episode_image