background

Petabyte Scale, Locking | Scaling Postgres 1

Join Over 1,000 Engineers & Get New Episodes Weekly!

In this episode of Scaling Postgres, we review articles covering petabyte scale techniques as well as covering various articles on locking.

Content Discussed

YouTube Video

Podcast Audio

Transcript



[00:00:00] Welcome to episode one of Scaling Postgres. My name is Kristen Jameson and Iwill be your host. Scaling Postgres is a free weekly show that curates content from around theweb along with my commentary to help you understand the best ways to scale and get the mostperformance from your postgres database. I'm going to try and focus especially on those topicsthat are of interest to developers, since typically the greatest performance gains are achievedthrough database design and architecture and how you use it in terms of format. I'm going to becurating the best content from around the web and presenting it to you to help you understandhow best to scale and get the most performance out of your postgres database. All right, let's getstarted.

[00:00:54] All right, the first article that we're going to be covering is Testing Database Changesthe Right Way. So this is an article from the Heap Analytics blog, and it talks about theirPetabyte scale cluster of postgres instances. So if you had the question of does postgres scale,clearly it does. Now, at a Petabyte scale, they're not talking about a single database, but frommy recollection, they are sharding their database. So there's multiple databases that handle thatPetabyte scale of data that they're working with, but nonetheless, they seem to be on theforefront of pushing postgres in terms of the amount of scaling and the amount of data it canhandle.

[00:01:37] Now, this is an interesting article because it tells you how they test database changesbefore they push them to production.

[00:01:45] So they discuss some things that didn't work and then talk about how they've comeup on the solution of using a shadow production. So basically, they clone out segments of theirproduction database and share traffic with production and the shadow database and makechanges to the shadow portion and check for changes to see how that works.

[00:02:09] And they go into detail about how they populate it, how they mirror the reads andwrites to make sure that they're getting accurate results and then analyzing them. And then theygo into an example of where they were rolling out table partitioning and they used this techniquein order to assure that their rollout went off without a hitch. So I highly encourage you to checkout this post, particularly if you're looking to scale to a significant level. Now, another thing aboutthis post is that it also has links to three articles that are equally valuable as well if you're lookingto scale. So, for example, this first link here has the title running 10 million PostgreSQL indexesin production and about how they serve their customers through utilizing a lot of, I believe, theirpartial indexes. The next article is basic performance analysis that saved them millions. I believethat's millions of dollars, as well as how they're analyzing the performance of millions of SQLqueries to basically eke out the best performance. So I definitely suggest checking out this blogpost as well as the associated ones as well. So for our next article, it's actually a presentationthat's been placed on SlideShare. The title is Postgres SQL at 20 Terabytes and beyond. What'sa little interesting about this is that actually it is another example of a cluster of databases andthe total cluster, the amount of data it's managing is 400 terabytes. So it's essentially 20 terabytePostgreSQL databases that they're using to do their analytics and it's basically doing theiranalytics at a massive scale. So sometimes it's hard to track with a presentation what's goingon, but this basically gives you insight into how they're using it to ingest data to process all ofthis and using Postgres for that. And like looking at the last couple of slides here, they'rebasically state that they're using PostgreSQL as a big data platform and it provides real timeanalytics to users'actions and they still have yet to outgrow Postgres SQL given the size thatthey are. So again, this is another case of if you're looking to scale, here's another example ofan organization that is using Postgres in interesting ways to basically manage a lot of differentdata.

[00:05:05] Our next article we're going to look at Rust Proof labs. So in this blog post, they coverpostgres version ten parallel queries and performance.

[00:05:17] So basically they're looking to see what kind of difference parallel queries can havefor the performance of different queries.] So they discuss some of their testing methodology and they have these great charts thatshow you their different simple queries that they've done. And they're comparing postgres 9.5,not 9.6, but that's okay. But at least they have a control of Postgres Ten here where they havedisabled the parallel execution. So probably the most apples to apples is comparing PostgresTen single to Postgres Ten parallel and you can see significant differences in terms of executiontime for simple queries.

[00:06:04] And they make the statement just upgrading to Postgres Ten essentially gives you aperformance increase.

[00:06:11] And also mention a warning if you're setting the max parallel workers per gathergreater than the number of cores that seems to cause a decrease in performance. So you wantto be aware of that if you're checking out enabling this for your Postgres database.

[00:06:30] And then they go into a more complex example and show the results here too. Andagain, they're getting a fairly good performance increase almost twice as fast with the parallelexecution times. And then they also mentioned the CPU time where it's as one would think theactual CPU usage increases as well. So if you're using Postgres Ten, this is a great article tolook at to review some of the settings you can make and think about how you want to configureit for the best performance, as well as just showing some third party evidence of theimprovements that they've made with Postgres Ten versus the version nine series.

[00:07:18] In our next article we're going to look at Postgres Rocks, except when it blocksunderstanding locks. And this is from the Citus Data blog and they talk about how of course,Postgres is great at doing highly concurrent transactions. But you do need to understand howlocking works, particularly when certain locks need to be exclusive or shared or things of thatnature. So this is a great table that they present and I'm thinking about printing this out formyself, but it basically looks at, given the different operations in Postgres, what can runconcurrently, safely together. So selects can run with selects in amongst, insert, update, deletes,it's perfectly fine. But things get interesting like say you need to create a trigger in a table. Well,you can pretty much only do selects while you're creating that trigger, otherwise Insert, Updateand Delete statements are going to be blocked. Whereas when you need to do an alter tableessentially everything, it'll need an exclusive lock on that to be able to make that change. So it'sa great thing to check out this blog post for.

[00:08:35] And it also goes over an example, a quick example here of where you're doing aselect and how this can essentially have your table lock preventing a select from happening.

[00:08:50] And it also has another table where it goes into difference between shared andexclusive locks for what you're trying to achieve. So that's another good thing to take a look at.Now.

[00:09:02] It also goes over the system table PG Locks and tells you how you can find usefulinformation from the system table for Postgres, telling you the status of different locks in thesystem and what's going on.

[00:09:17] Now, in addition to this post, the other very interesting post that has a lot of beneficialinformation in terms of our next article is again from the Citus Data blog when Postgres BlocksSeven Tips for Dealing with Locks. So pretty much everyone should probably take this and postit up and have it as a readily accessible reference as you're planning database changes. So stepnumber one, never add a column with a default value because basically when you do this, theentire table will essentially have to be rewritten with new rows with that default value added,which can block, read and write activity. As opposed to doing that, they suggest adding thecolumn without a default and then updating it. And of course, if you have a very large table, eventhat update statement to prevent extended locking occurring, you're going to want to break upyour updates in batches. The point number two, beware of lock. Queues use lock timeouts.Because locks are in a queue, they can block up behind one another as one is waiting. So if yout the problem is every other say, insert statement or update statement that's happeningafterward is going to be blocking because this is first in the queue to happen. So basically youstart blocking your entire table for activity. So what they suggest is to set a lock timeout to 2seconds right before you run this command, so that if it isn't able to acquire the lock it needs toalter the table, it's going to time out after 2 seconds. So definitely a valuable thing to do .3 iscreate your indexes concurrently. And again, this is definitely the golden rule of postgres SQL asthey state here is to always create your indexes concurrently so that you don't block writes toyour table while you're creating indexes.

[00:11:41] .4 is take aggressive locks as late as possible.

[00:11:45] So basically using your knowledge of locks, load into a new table and then do arenaming as opposed to doing a long running operation within a transaction. So basically youwant to keep your trans in general, you want to keep your transactions as short as possible. Ifyou're updating data, for example, keep the duration that that transaction is open as small aspossible.

[00:12:10] So zero five is adding a primary key with minimal locking and basically long runningthings. You don't want to maintain a lock open for a very long time, which is adding a primarykey. What it will do because it has to add the index. Again, the proposed workaround is to createa unique index concurrently. Again, the rule number three they mentioned on that column andthen adding that primary key constraint to the table will be fast because the unique index in thiscase would have already been created.

[00:12:45] Zero six never vacuum full.

[00:12:49] Vacuum is usually sufficient and it's interesting what they put here. Quote pleasefreeze my database for hours if you do a vacuum full.

[00:12:58] So definitely a good thing to take to, not to, and .7 avoid deadlocks by orderingcommands. I personally haven't seen a lot of this deadlocks, but I think it's more if you're using alot of transactions in your application, like explicit begin commit, end rollback type thing. If you'redoing a lot of that, I think there's a higher probability that a deadlock will happen.

[00:13:32] However, my experience, I haven't really seen much of this, but they give some goodadvice here on how to avoid deadlocks by ordering your commands. All right, in our next postwe're looking at migrations and long transactions in Postgres, which is a blog from the FinExploration Company.

[00:13:53] So this goes right back to related to the previous article in terms of locking queues inpostgres. So essentially they were adding a column to a table without a default, but thingsstarted slowing down 20 minutes. The database migration they were running still hadn'tcompleted, sort of trying to figure out what was going on. And after investigation they wererunning a long running backfill of some old data. Once they stopped that, the migration finishedimmediately. So again they were doing an update in a long transaction that kind of startedlocking up other actions from happening once this altered table was trying to occur. And he goesinto again looking at the PG locks table and the different locks that get created and basically kindof replicates what happened in his instance, as well as talks about his takeaways and preventionstrategies. Although what would also assist is definitely from the previous post where they weretalking about setting a lock timeout. So the next blog post comes from PG Steph's blog and it'sintroduction to Postgres Automatic Failover. And he's basically talking about PAF which is alsoknown as Postgres Automatic Failover which is a resource agent for providing high availabilityfor Postgres SQL based on pacemaker and coresync. So if you're at the stage where you'rewanting a highly available cluster that fails over to a replica without manual intervention, this postdefinitely shows how you can do that using pacemaker as well as this resource agent to be ableand he goes through the whole process of setting up the server. So if you're at that point Idefinitely encourage you to check out this blog post for that.ny years.

[00:16:19] One of the ways it's wrong he's saying here is that you can adjust the block size tomake it larger. In addition, they're talking about table partitioning and how in postgres Ten theyhave now have declarative partitioning which makes partitioning easier, but they also resolved abug in Postgres SQL Eleven. David Rowley has found the bug where basically you could expandthe number of subtables allowed for partitioning.

[00:16:52] So in postgres ten it only allowed 65,000 partitions, which is still quite a lot, but nowyou can go to the full size of the number of partitions allowed. So basically this is veryinteresting. Maximum table size of 32 terabytes postgres 9.62 exabytes in Ten and 131 yodabytes in Postgres SQL eleven.

[00:17:15] So that does it for episode one of Scaling Postgres. To get all the links for the articlesin this episode, check the link in the Show Notes. Be sure to head over to Scalingposgres.com ifyou'd like to sign up to receive weekly notifications of these episodes or you're also welcome tosubscribe on YouTube or itunes. Thanks. --

episode_image