background

PG 11, Multi-tenant Scale, Trigger Speed, Full Text Search | Scaling Postgres 14

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering the upcoming PostgreSQL 11, scaling for multi-tenant apps, the speed of triggers and full text search.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres we talk about postgres eleven, multitenant scaletrigger speed and full text search. I'm creston. Jameson and this is scaling postgres episode 14one.

[00:00:20] Alright, the first article of the week is actually a news announcement. So PostgreSQLeleven beta one is released. So I have the release notes here and it talks about all the differentmajor features that have been added to PostgreSQL eleven. So the first what they talk about ismajor improvements to partitioning. So you can now partition by a hash key update statementsissued to a partition key now move affected rows to the appropriate partitions so it helps make iteasier to manage your data and keep them in the right partitions. Improved select queryperformance due to enhanced partition elimination. So basically you can pare down thepartitions that you're hitting much more efficiently and support for primary key and foreign keyindexes and triggers on partitions. So this is a lot of great improvements to the partitioning.

[00:01:16] Another is improvements to parallelism. So basically being able to use more cores ofyour database system to satisfy queries. So they've added the ability to parallelize hash joins,creating an index for B tree indexes as well as create table as and create materialized views andcertain queries with union. So again, another set of great improvements that are coming. Thenext listed which we and a lot of these we've talked about in previous episodes of ScalingPostgres, but these appear to be the ones that will definitely be released this coming year inPostgreSQL eleven is SQL store procedures. So basically the ability to embed transactionswithin a function or a procedure, so being able to begin, commit and rollback within a function,and those are called procedures.

[00:02:08] They introduced just in time compilation to help optimize execution code andoperations at runtime as well as Windows functions, fully support the SQL 2011 standard andthen a few other user experience enhancements that they mention here. The big one for peoplethat have large tables is this one here I believe where basically before when you were adding acolumn to a table and you had a default, it would have to rewrite the entire table. But here it saysPostgreSQL eleven removes the need to rewrite the table. In most cases we'll have to with timekind of determine what the case is, where that's not the case because we need to fall back to theold way of doing it. But basically now when you add a column with a default, it should execute,they say, extremely quickly. And another quite interesting one is that psql on the command line,normally to exit it, you do a backslash Q to exit, but they've now added you could use quit andexit as well, so making it a lot more intuitive, particularly for new users. So a lot of great featuresin this release, and I've just kind of talked over the major points. Feel free to check the link in theshow notes if you want to read in more detail.

[00:03:24] The next post is preparing your multitenant app for scale. And this is from theCitusdata.com blog. So in this post, they assume you have a multitenant app. So multipleaccounts that you satisfy with one app or multiple customers that their data is commingled. Andof course, they give some initial advice where you want to avoid schema based sharding as wellas one database per customer. Sharding? But assuming that you've done that, what prep workwould you need if you're wanting to further scale your multi tenant app, potentially eventuallymoving on to citus data? Of course, they hope the first thing they mention is denormalize earlierto scale later. So normally you're going to set your database up in a normalized fashion, but asyou continue using it to get better performance, you can start denormalizing certain areas. Andone way they suggest here is typically you have a tenant ID that breaks up your database intowhat data is dedicated to what customer. Now, I've seen this called a tenant ID. I've seen thiscalled an account ID. I've seen this called a customer ID. But whatever you're going to be using,you should pretty much add that to all your tables, even though that's probably not the fullynormalized best practice. But it does allow you to segment your data and return queries a lotfaster. Because generally, you're going to want to look at hey, what is this? Customers? Or whatis this? Account's activity i --n a certain area. The second recommendation they have is to adapt your keys to leverage themultitenant schema. Now, I see here that they have created a table and that they've defined acomposite or a multicolumn primary key going by the tenant ID, essentially, and then by theunique ID for the field. Now. I haven't seen this done a lot in practice. Normally the databasesthat I see follow the structure of what their orm does, and typically the primary key is just an ID,and it can take some work to readjust it on a live app. And I'm wondering if this is a case ofpremature optimization, because in order to avoid pain later on, you're going to have to do thisreally early. I myself have scaled a database system from application up to a Terabyte. I've seenother customers multi terabytes and they haven't had to change the primary key in this way. Ican see some advantages to it, but I wonder how useful this could be. And I'm wondering if othermulticolumn indexes not necessarily the primary key, would get you 90% of the way there. SoI'm not wedded to this particular suggestion because I think you could do it with just additionalindexes on the table. But maybe this becomes important if you're wanting to do sharding andmigrating to Cytus data. I'm not sure. And The Last Suggestion is manage your postgresconnections before they manage you. So basically once you start getting a high level ofconnections, you're going to want to introduce some sort of connection manager like Pgbouncer.And that's what I've used historically. So a couple of good recommendations and one that I'mwondering if this might be a little bit premature, but definitely a blog post to check out.

[00:06:38] The next post is are triggers really that slow? In postgres this discusses using triggersin your application or your database system. And he talks about sometimes some applicationdevelopers would say using triggers is an awful practice, doing magic stuff kind of secretly,whereas people more versed in databases would probably see good use cases for them. Now Ihaven't used a lot of triggers, usually they're for particular purpose, but it's rare that I use them inmy application. But what's interesting about this is this article uses PG bench to actually take alook at what kind of load a trigger places because it's going to place some sort of load on thedatabase system. But how much is it?

[00:07:22] And he goes through all of the methodology that he used. The scenario he used wasdoing auditing. So he created a separate table with two columns a timestamp and a text ID tobasically record when something was last modified and by whom. So auditing what changes arehappening to particular tables in the database and then he used PG bench to run it. Andessentially with these triggers added, there was basically a 0.4% difference. So basically Iconsider that pretty negligible and I wonder actually what the standard deviation is because howmeasurable is it? Now some of this kind of makes sense and I would imagine that triggers havea negligible impact compared to doing other things in your database. Like for example, if yourapplication is doing some sort of auditing and he mentions that here. But thinking about itlogically, if your application has to make the calls to the database, you have a network round tripyou have to deal with. Potentially, if you're doing things in the Orm that requires application loadto deal with and just having the database self contained, adding additional rows to a table orpotentially, maybe even updating them shouldn't take that much time. So really triggers are notthat slow as long as they're kept relatively simple. So if you're considering using triggers, maybeyou want to check out this post.

[00:08:46] The next post is Create View versus Alter table in PostgreSQL. Now, I haven't used alot of views in the applications that I've developed. I've used them historically when doing a lot ofreporting and typically that's how I've used them in my application is to give simplified views ofthe data for reporting purposes for my applications. But it's an interesting post and then it goesinto the background of how views are managed and how it can handle changes you implementat the table and how the views are able to keep up with that and then even list some caseswhere they aren't able to handle it, like when you're dropping a column. But in most cases theviews can be maintained and kept up to date with changes --to the table. So, definitely an interesting blog post to check out if you're interested in how thatworks behind the scenes.

[00:09:35] And I should say that these last two posts were from the Cybertechgresql.com blog.

[00:09:43] The next post is Full Text Search made almost right in PostgreSQL Eleven.

[00:09:49] And this is from the Akorotkov GitHub IO blog or Alexander Korotokov's blog. And in ithe discusses changes made to PostgreSQL Eleven. And specifically I'll actually mention what'slist here. Long story short, using PostgreSQL Eleven and Rum index you can do both a top endquery and count all columns query for non selected full text search queries without fetching allthe matching results from the heap. So essentially making it much faster. So normally whendoing full text search I've added Gin indexes, but he's saying that with some changes added tothe Rum index, you're able to add some additional information to them. That makes doing topend queries as well as counting queries a lot more efficient, at least in PostgreSQL Eleven. So ifyou do full text search, I definitely suggest you check out this blog post.

[00:10:54] The next post is actually a YouTube video in its Rails Conf 2018 Postgres Tenperformance and you by Gabe Insigne. So actually in Scaling Postgres episode Ten I shared thepresentation, I believe it was on SlideShare of this presentation. But now we see that thisYouTube video has been posted. Basically this presented some business scenarios and then likefor example, oh, you're seeing poor performance with tables with a lot of rows in them, what'sthe solution to it? So basically presenting business problems and then identifying features inPostgreSQL Ten that allows you to address those efficiently. So now that we have thepresentation, I definitely suggest you review it, particularly if you've looked at the slides andfound them interesting.

[00:11:47] The next post is Understanding Deadlocks in MySQL and postgres SQL. So this isfrom the several nines.com blog and it basically describes Deadlocks and then how to simulatethem using PostgreSQL as well as MySQL. So I historically have how I've structured myapplications have not seen a lot of or hardly any Deadlocks, but there are certain cases whenyou're doing manual transactions and special types of locking where you can run into thissituation. So it basically describes what they are as well as gives you an example of how ithappens in the error that you will see. So if you're seeing Deadlocks or you're just interested inwhat they are, it's basically you can't resolve two updates or deletes that happen concurrently.Definitely a blog post to check out that does it. For this episode of Scaling Postgres, you can getlinks to all the content in the Show notes. Be sure to head over to Scalingposgres.com, whereyou can sign up to receive weekly notifications of each episode. Or you could subscribe viaYouTube or itunes. Thanks. --

episode_image