background

Efficient Pagination, Rotating Passwords, BRIN, Row Level Security | Scaling Postgres 78

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we discuss efficient pagination, how to rotate passwords, BRIN benefits and Row Level Security.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] In this episode of Scaling Postgres, we talk about efficient paginization, rotatingpasswords, brin and row level security. I'm Kristen Jameson and this is Scaling Postgres,episode 78.

[00:00:21] Alright, I hope you're having a great week. Our first piece of content is pagination withrelative cursors. And this is from the Engineering Shopify.com blog and they're talking aboutdoing pagination. So the way a lot of tools do paginization is they use a limit and then an offset.But this is very inefficient once you get to large record sizes. And they're showing an example ofit here. The first query with a ten offset is six milliseconds, seven milliseconds, eight, that's nottoo bad. Once you get up to 10,000, it gets up to 80 milliseconds. And this apparently is for 14million products. Once you get up to 100,000 with the offset, it's over 2 seconds. And then it justtimes out, they say, at the 1,000,000th offset, which probably is something north of 30 seconds.So there's no way to be performant. So basically, as other articles have mentioned, using thismethod for paginization, it does work for small record sizes, but once you get large, it's veryinefficient. So they're talking about relative cursor pagination. Now, this isn't cursors as youtypically think of it in terms of server side programming within a database, like a databasecursor, but it's still a cursor nonetheless. You basically keep the last ID of what was looked at. Sofor example, here, if the last ID someone was looking at as they were paging through somethingwas 67 890, then you would show the products where the ID is greater than that last essentiallywhere the cursor is and order it by the ID in ascending order. So this is assuming that you'reordering whatever you're ordering by by ID and this becomes very efficient. So you can keepincreasing the offset. Essentially what's being used, meaning looking at the 100,000th row inthat 14 million and you'll still get great performance, the performance doesn't diminish. But ofcourse, maybe you need to order by something different than an ID. Like maybe you're sortingthe records you're looking at by a particular date or a name. How could you do that? And herethey're talking about a sorting and skipping records where they're wanting to sort by a title andthen by the product ID in case of assuming you have duplicate product titles. And here they usethe technique of where the title is greater than pants. So all titles great once you get to thepants, the cursor is at pants as well as an ID larger than the last ID you were looking at. Solooking at all titles greater than pants, as well as those titles that equal pants and the product IDis greater than two. And then of course, sorting appropriately by the title ascending and then theID ascending as well, that will allow you to page through these types of records efficiently. Now,one thing I was looking at is this or statement and I wonder instead if they could use the ability touse a where statement containing two columns. So you can use parens to say perenn title,comma ID closing perenn greater than, and then a perenn around the values that you'researching for. I wonder if that is more efficient planning wise to the postgres planner. Sopotentially if you're investigating using a solution like this for your Pagination, maybe try thattechnique as well. Querying two columns at the same time and looking at the combined value,that may be more efficient than using the or. And then they did the timing of this example, itwasn't as efficient as just the ID, but it still results in pretty consistent performance and youwould need a multicolumn index to get the most efficiency out of this type of Pagination, ofcourse. So if you're looking to improve your Pagination and you're using the offsite method,definitely check out this blog post for some ideas to improve that.

[00:04:14] The next post is rotating PostgreSQL passwords with no downtime. So this is thecase of where you have a password for a postgres user that an application is using and youwant to change that password. So we talked about three scenarios here. You could update thedatabase and then update the application, and in between you'll have a short downtime, makeboth the old and new password known to the application and let it fall back on the new one oncethe old one fails, or create a copy of the database user, update the application and delete the olduser. So of course --this sounds ideal, but it sounds like it might be a little bit complicated, but this is exactly whatthis post describes how to do. So essentially they use roles within postgres. They create aMyApp role with no login and that's the one that has the permissions to all the objects in thedatabase. And then created My app tom, essentially a different role with no login in the samerole as this, and set My app as the main role. And then created My app Jerry, essentially anotherusername with presumably a different password with the same permissions. It's basically has setits role to My app. Now, assuming that My app tom is the one that's active, you make it withlogin and have its password. So essentially this user can log in with the application to get accessto all the objects that MyApp has access to. And then down the line, if you want to change thepassword that's being used, you make this secondary role, essentially the inactive one, MyAppjerry make it with login, set its new password, and then you change the as they say here, andthen you change the application to use this new username. It begins using it and then youdeactivate the old one by reverting it back to no login. So this is a pretty efficient way to be ableto do password changes, coordinating what the application is using in the database without anydowntime. So if you're interested in that, definitely a blog post to check out. They also talkedabout using Liquid Base, but I didn't really cover this part, but this may be of interest to you aswell.

[00:06:24] The next post is create a Brin index at a fraction of the normal size. And this is fromAlibaba Cloud. Now they talk about B trees, which are essentially the most common index, butalso Brin. And they say how useful it can be for the Internet of things or things using time seriesdata. Because Brin is essentially a block range index, it indexes things in ranges.

[00:06:52] So for use cases where you have constantly increasing data, like an incremental ID,like a serial data type, or if you have dates that are constantly increasing, a Brin index could bebeneficial in that type of scenario. Now, we've discussed this before in previous episodes ofScaling Postgres, how the Brin index super super small because it's only storing ranges, not theexact data. So it targets a range and then it has to look within that range to find the exact datawhen you're doing a query. So for that reason it's super, super small. But it's not as performantlooking at particular individual records as a Btree index. But one thing that's interesting aboutthis post, it also looks at insert performance. Because you're having less records inserted by theBrin index, it's definitely more performant for inserts as well. Now, he goes through all thedifferent testing scenarios here, but I want to kind of jump down to the meat of the material in thegraphs here where they talk about space. So, for example, again, what we've heard before, brinsis super small. So look at this comparison table, over 4000 megabytes, the Btree index 2500megabytes, whereas the Brin I think is 4.5 KB. So super small relative to the table size and theBtree index. And then he also did queries with regard to a range query and then an exact query.So you'll see, the cost of a full table scan is immensely large doing range or exact queries doingthe B tree index is the fastest, so 24 milliseconds and then part of a millisecond for the exactquery. Whereas with a Brin index it's still vastly faster than a full table scan, but not as fast as aB tree. But again, for this space savings, is this fast enough? Now, the next part that I've beenseeing discussed elsewhere, but it makes total sense, is how insert performance compares. Sowith no index you get essentially 66,000 rows per second, with a Brin index it's 62,000. Soalmost negligible performance on inserts. Whereas with a B tree essentially you're less than halfof your insert performance. So if you're using a lot of time series data, and things that need fastinserts. Definitely check out Brendan Indexes again and check out this blog post to see how youcould potentially incorporate them in your database system.

[00:09:18] The next post is PostgreSQL row Level Security Views and a lot of magic. So thispost basically goes over row level security and what it is and how to set it up and some things towatch out for. So first they create a user, Bob and Alice, and then they create a table with tw --o different types of data. And they allow Bob and Alice to access the schema and the tablesso they can essentially query this table. They created a little function to be able to debug exactlywhat was happening when row level security was turned on. And then they altered the table,specify the table name t service enable row Level Security. Now, once that happens, no one canessentially query anything. So you need to set a policy. So it creates a Bob policy on this table toallow him to select when the service type was open source. And that's one of the column valuesin here. So when the service type is open source, Bob can select from it. When the service typeis closed sourced, Alice can select from it. So when you set the role to Bob and select from theservice table, you only see the open source service types. When you set it to the role to Alice,you only see the closed source data types. Okay, that works great. But then what if we're usingviews? So here he sets the role to postgres creates two different views. The first view both userscan select from, but then the V two sets the owner to Alice. So Alice is the owner and grantsselect on V Two to Bob. Now, when you set the role to Bob and query from this view, you getboth sets of data. The reason is because the V one view is owned by the Postgres user whocreated it. So that's maybe not what you want. Essentially, Bob can now see both sets of databecause Postgres can see both sets of data. So it's not applying the policy to Bob. Next still isBob. When you look at the second view, you only see closed source and they say here quote,the reason is because V two belongs to Alice and therefore PostgreSQL will check Alice's RLSpolicy so it's showing what she can see even though you're Bob, because Bob has viewpermissions on it. So again, these are corner cases you need to be aware of and how viewsbehave when row level security is enabled. And then they go through and they talk aboutanother scenario, setting up a row level security that may have some unexpected behavior.Now, my own application, I don't use row level security. I tend to do security at the applicationlayer. However, if you have a need to do it, at the database layer. This is definitely a great postto check out to explain some of the corner cases as they describe for row level security inPostgreSQL.

[00:12:09] The next post is Managing High Availability in PostgreSQL Part Three patroni. Andthis is from ScaleGrid IO. And we've covered the previous two posts. The first one onPostgreSQL automatic Failover or PAF. And the Part Two replication Manager. So this coversPetroni So. Essentially another high availability framework for using PostgreSQL. So it goes overhow it works, how does it handle the split brain scenario, talks about Pros versus Cons, and thendescribes their whole testing methodology, and then discusses what's the best PostgreSQL HighAvailability framework and compares PAF Rep Manager and Petrone So. It has these differenttests that they did and compares them. So if you're looking for a High Availability solution forPostgreSQL, definitely a blog post to check out the next post. Developing on many versions ofPostgreSQL at once. So this is talking about someone who uses many versions of PostgreSQL,I believe doing some development work and support work and how he juggles them. Now thereare virtual environments which is akin to like Ruby's RBM for RVM and Python's Virtual End. Sothere's a tool he mentions called PG underscore Virtual End and Pgnv so these are tools thatallow you to set up multiple environments and switch between different versions of PostgreSQLon the fly. He actually does some things a little bit different that he describes in this blog post. Hetalks about CC cache to be able to nuke your cache when you need to, particularly if you'rejumping between different versions. Talks about how he works with Git Work Trees as well. So ifyou're needing to coordinate multiple versions of PostgreSQL that you're working with, definitelyblog Post to check out next post is clone schema in Postgres. So basically there's no out of thebox way to clone a schema either, just the schema itself, the objects without data or objects withdata. So this blog post by Pateldinish.com has a link to a tool that allows you to clone a schemaand does schema only or schema with data as well. So if you're wanting a tool to be able toclone schemas, maybe this is a blog post and a tool --to check out.

[00:14:32] The next post is percona distribution for PostgreSQL eleven. Beta is now available.And this was from Procona.com and it talks about it sets up I don't believe this is a proprietarysetup of PostgreSQL, but it's basically they've put together an open source package, as it were,with version 11.5, most recent version and complements it with a selection of extensions. Sothey say additional extension supported by PostgreSQL Global Development Group, which lookslike it's the Contrib library. Pgrack for being able to do something akin to a vacuum. Full ontables while they're live. PG audit to be able to enhance the object audit logging PostgreSQLlogging PG backrest as a backup replacement or an alternative to PG based backup. And thenPetroni, which we just saw a blog post on, which is the high availability solution for PostgreSQL.So essentially it packages these up together in a cohesive unit. So if this is interest to you,definitely a blog post to check out.

[00:15:39] The next post is porting a PostgreSQL extension from Unix to Windows Ten. So thisis very specific to Windows and it's how they have ported an extension from a Unix system toWindows Ten. So if you do extension development, particularly want to get something workingon Windows, there's a blog post to check out and the last blog post is waiting for PostGIS threeparallelism in PostGIS. Historically PostGIS hasn't been able to use that many parallel querieshere. They say very few queries would parallelize and you had to force certain configurations toget them to use a parallel plan. But they say a quote with PostgreSQL twelve and PostGISThree parallel query plans will be generated and executed far more often because of changes toboth pieces of software. PostgreSQL twelve includes a new API that extends the ability tomodify query plans and add index clauses. And PostGIS three has taken advantage as well tobe able to enable more parallel plans to run. And they have a little example here of a particularfunction where they saw more parallel plans being used and dropping the runtime of a particularquery. Although of course, as you ramp up the number of cores, as with anything parallel, you doget diminishing returns. But this is a boon to people using PostGIS because you'll get moreparallel query planning.

[00:17:13] That does it for this episode of Scaling Postgres. You can get links to all the contentmentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can signup to receive weekly notifications of each episode, or you could subscribe via YouTube or itunes.Thanks. Next. --

episode_image